Spanish AI firm Multiverse Computing has secured a €189 million ($215 million) Series B funding round led by Bullhound Capital to scale its groundbreaking CompactifAI technology, which can reduce the size of large language models by up to 95% while maintaining performance and cutting inference costs by 50-80%.
CompactifAI leverages quantum-inspired Tensor Networks to compress LLMs in a revolutionary way that goes beyond traditional methods like quantization and pruning. Rather than simply reducing neurons, the technology focuses on compressing the "correlation space" within models by decomposing weight matrices into Matrix Product Operators (MPOs)1. This approach allows for more controlled compression while maintaining model integrity, resulting in up to 95% size reduction with only 2-3% precision loss23.
The technology works through a sophisticated process that includes layer sensitivity profiling to identify which layers can be compressed more aggressively, followed by tensorization where trainable weights are replaced with MPOs1. This not only makes models 4-12x faster but also reduces energy consumption and enables deployment across diverse hardware environments – from cloud infrastructure to edge devices like phones, PCs, and even Raspberry Pi45. CompactifAI models are available for leading open-source LLMs including Llama, DeepSeek, and Mistral, with pricing based on input and output tokens through AWS Marketplace67.
While traditional lossless compression methods like ZipNN can reduce model sizes by 33-50% without any accuracy loss1, Multiverse's CompactifAI technology pushes the boundaries with its impressive 95% compression rate while maintaining remarkable performance. This minimal 2-3% precision loss2 represents an exceptional trade-off that makes AI deployment significantly more accessible and cost-effective. For context, in machine learning, precision refers to the percentage of model predictions that are correct3, so this minimal degradation ensures the compressed models remain highly reliable for real-world applications.
The implications of this compression-to-performance ratio are substantial for enterprise adoption. While corporations typically expect 99+% accuracy from human employees4, CompactifAI's approach demonstrates that slightly reduced precision can deliver massive efficiency gains without compromising essential functionality. This balance is achieved through the quantum-inspired tensor network approach that specifically targets the model's correlation space rather than simply reducing parameters through conventional techniques like pruning or quantization5. The result is a breakthrough that addresses the fundamental challenge of deploying large AI models in resource-constrained environments while maintaining their core capabilities.
Multiverse Computing's breakthrough in AI compression stems from quantum principles applied to classical computing problems. Their quantum-inspired approach leverages tensor networks—mathematical structures originally developed for quantum physics—to identify and preserve essential correlations within AI models while eliminating redundancies12. Unlike traditional compression methods that simply reduce parameters or lower numerical precision, this technique reconstructs the model's internal architecture to maintain performance with significantly fewer resources3.
The quantum-inspired methodology has applications beyond LLMs, showing promising results in computer vision as well. Frameworks like QIANets demonstrate how quantum-inspired pruning, tensor decomposition, and annealing-based matrix factorization can reduce CNN inference times by 50-70% while maintaining comparable accuracy to original models34. This versatility makes quantum-inspired compression particularly valuable across the AI ecosystem, enabling deployment in resource-constrained environments from edge devices to industrial settings where computational efficiency is critical5.