Home
Finance
Travel
Shopping
Academic
Library
Create a Thread
Home
Discover
Spaces
 
 
  • Hybrid Mixture-of-Experts Architecture
  • Lightning Attention Mechanism
  • 1 Million Token Context Window
 
MiniMax claims new M1 model needs half the compute of DeepSeek-R1

Shanghai-based AI startup MiniMax has launched MiniMax-M1, its first open-source reasoning model that reportedly requires only half the computing power of rival DeepSeek-R1 for reasoning tasks with generation lengths under 64,000 tokens, according to the South China Morning Post.

User avatar
Curated by
curioustheo
3 min read
Published
9,197
406
scmp.com favicon
South China Morning Post
DeepSeek rival MiniMax says its first AI reasoning model halves compute of R1
theinformation.com favicon
theinformation
China’s MiniMax to Release Open-Source AI Reasoning Model
github.com favicon
GitHub
MiniMax-M1, the world's first open-weight, large-scale ... - GitHub
DeepSeek rival MiniMax says its first AI reasoning model ...
scmp.com
Hybrid Mixture-of-Experts Architecture

The Hybrid Mixture-of-Experts (MoE) architecture represents an evolution in AI model design that balances the advantages of both dense and sparse MoE approaches. Unlike traditional MoE models that rely entirely on sparse expert networks, hybrid architectures strategically combine dense layers with sparse MoE components to optimize performance and efficiency.1 This approach addresses one of the key challenges of pure MoE systems—the communication overhead that occurs when routing tokens to different experts, which can become a bottleneck in distributed computing environments.

The hybrid design offers several compelling benefits: it maintains the quality improvements of having multiple specialized experts while reducing the all-to-all communication costs that plague fully sparse architectures.1 By incorporating both paradigms, these models can achieve better inference quality without dramatically increasing computational demands. This architectural innovation is particularly relevant for large language models seeking to scale efficiently, as it allows developers to selectively apply the sparse MoE approach only where it provides the greatest benefit, while using traditional dense layers elsewhere in the network.

ibm.com favicon
huggingface.co favicon
wandb.ai favicon
8 sources
Lightning Attention Mechanism

Lightning Attention is a groundbreaking linear attention mechanism that maintains constant training speed across various sequence lengths while using fixed memory consumption12. Unlike traditional linear attention implementations that struggle with cumulative summation operations (cumsum) in causal settings, Lightning Attention employs a divide-and-conquer strategy that splits attention calculations into two components: intra-blocks using conventional attention computation, and inter-blocks utilizing linear attention kernel tricks23. This hybrid approach eliminates the need for cumsum operations that typically hinder performance.

The mechanism is further optimized through:

  • Tiling techniques in both forward and backward passes to maximize GPU hardware efficiency2

  • IO-aware implementation that leverages high bandwidth memory (HBM) and on-chip SRAM for optimized memory access patterns24

  • Lightning Attention-2, an evolution of the original algorithm, which enables handling of unlimited sequence lengths in large language models without compromising speed56

This technology has been successfully implemented in models like Minimax-01, which achieves context lengths of up to 1 million tokens by using Lightning Attention in an 8:1 ratio with softmax attention7.

arxiv.org favicon
github.com favicon
raw.githubusercontent.com favicon
10 sources
1 Million Token Context Window

The 1 million token context window represents a revolutionary advancement in large language model capabilities, dramatically expanding the amount of information these systems can process simultaneously. This massive context capacity enables models to handle approximately 50,000 lines of code, 8 complete novels, or 200+ podcast episode transcripts in a single prompt1. Models featuring this capability include Google's Gemini 1.5 Pro, OpenAI's GPT-4.1, Meta's Llama 4 Maverick, and Alibaba's Qwen2.5-1M—the first open-source model to achieve this milestone23.

This expanded context window transforms AI applications across industries by overcoming traditional limitations that required context-management techniques like truncation, summarization, or RAG implementations1. Legal professionals can now analyze thousands of pages of case law simultaneously, financial analysts can evaluate decades of market data in one query, and AI assistants can maintain conversational memory across extensive interactions3. The technology behind these advancements often involves innovative attention mechanisms that solve the quadratic scaling problem of traditional transformer architectures, allowing models to efficiently process and reason over unprecedented amounts of text45.

reddit.com favicon
blog.google favicon
codingscape.com favicon
8 sources
Related
How does MiniMax-M1 achieve half the compute of DeepSeek-R1 for reasoning tasks
What makes Lightning Attention suitable for handling 1 million token contexts
How will MiniMax-M1's hybrid MoE architecture improve large language model efficiency
Discover more
Meta seeks $29B to fund massive AI data center expansion
Meta seeks $29B to fund massive AI data center expansion
Meta Platforms is in advanced discussions with major private equity firms to raise $29 billion for expanding its artificial intelligence data centers across the United States, according to multiple reports Friday. The social media giant plans to structure the funding as $3 billion in equity and $26 billion in debt. The fundraising effort represents the latest escalation in Meta's infrastructure...
5,399
Meta poaches four key OpenAI researchers for AI team
Meta poaches four key OpenAI researchers for AI team
Meta has successfully recruited four key researchers from OpenAI to join its artificial intelligence superintelligence team, marking a notable victory in CEO Mark Zuckerberg's aggressive campaign to attract top AI talent with compensation packages reportedly exceeding $100 million. The social media giant hired Lucas Beyer, Alexander Kolesnikov, and Xiaohua Zhai from OpenAI's Zurich office,...
3,876
DeepSeek delays R2 AI model as CEO deems performance lacking
DeepSeek delays R2 AI model as CEO deems performance lacking
Chinese artificial intelligence startup DeepSeek has postponed the launch of its highly anticipated R2 model, with CEO Liang Wenfeng reportedly dissatisfied with the system's current performance, according to multiple reports Thursday. The delay comes as the company faces mounting pressure from U.S. export restrictions that have created severe shortages of Nvidia server chips needed to deploy...
15,663
Google launches open-source Gemini CLI for developers
Google launches open-source Gemini CLI for developers
Google unveiled Gemini CLI on Wednesday, an open-source artificial intelligence tool that brings the company's Gemini AI models directly into developers' command-line terminals. The launch represents Google's latest effort to compete with similar AI coding assistants from OpenAI and Anthropic in the rapidly expanding market for AI-powered development tools. The tool allows developers to make...
14,348