Xiaomi's MiMo-V2.5 price drop of up to 99% is primarily driven by **significant technical optimizations in inference efficiency** and **strategic market positioning** rather than a simple discount. Here are the key reasons:
## Technical Optimizations
- **SGLang HiCache Integration**: Reduced data movement across the cache hierarchy (GPU VRAM, CPU RAM, SSD) to **one-seventh of previous levels** while increasing cacheable token capacity **fivefold**.
- **Sliding Window Attention (SWA)**: Interleaved with Global Attention to reduce KV cache usage while preserving long-context capability (1M token window).
- **Hybrid Attention Architecture**: Combines local sliding window attention with global attention for optimal performance.
- **Multi-Token Prediction (MTP)**: 3-layer module accelerates decoding speed.
## Market & Strategic Reasons
1. **Competitive Pressure**: DeepSeek V4-Pro's permanent 75% discount forced Xiaomi to match pricing cent-for-cent ($0.0036 per million cached-input tokens for both).
2. **Developer Ecosystem Growth**: Xiaomi prioritizes developer habit formation over near-term API margins, increasing Token Plan quotas **5-8×** while resetting usage.
3. **Ecosystem Strategy**: MiMo serves as a strategic wedge across Xiaomi's hardware ecosystem (phones, vehicles, robotics) rather than a standalone profit center.
4. **Market Leadership**: The 100-trillion-token "MiMo Orbit" incentive program concluded successfully, demonstrating market demand.
## Pricing Impact
- **MiMo-V2.5-Pro**: $0.0036/million cached-input tokens, $0.435/cache misses, $0.87/output tokens
- **Base MiMo-V2.5**: $0.0028/million cached input, $0.28/output (matching DeepSeek V4-Flash economics)
## Limitations
While the technical optimizations are substantial, Western enterprise adoption faces **geopolitical barriers** including data residency concerns and cybersecurity considerations, despite the models being open-source (MIT licensed).
The price drop reflects a broader industry trend where inference is becoming a **deflationary commodity**, with value shifting to specialized agentic workflows and multi-model routing strategies.
1navigatehttps://platform.xiaomimimo.com/docs/en-US/welcome
2clickundefined
3ddm--text
4clickundefined
5clickundefined
6get_text
7navigatehttps://platform.xiaomimimo.com/token-plan
8searchXiaomi MiMo-V2.5 price drop 99% May 2026
9clickundefined
10clickundefined
11navigatehttps://www.weste.net/2026/05-27/mimo-v2.5-price-cut.html
12ddm--text
13searchXiaomi MiMo V2.5 price drop reasons technical improvements inference optimization
14clickundefined
15clickundefined
16navigatehttps://www.ctol.digital/news/xiaomi-slashes-mimo-ai-prices-99-how-the-deepseek-parity-signals-structural-inference-deflation/
17ddm--text --find technical
18searchXiaomi MiMo V2.5 technical improvements SGLang HiCache Sliding Window Attention