Xiaomi MiMo-V2.5 Series API Gets Permanent Price Cut, Up to 99% Off
Amid the escalating AI model price wars, Xiaomi officially announced on May 27 that its MiMo large model would permanently reduce prices for the MiMo-V2.5 series API while simultaneously optimizing the billing system to further lower developers' calling costs through technological advancements.

I. Significant API Price Cuts — Up to 99% Off
The price change took effect globally at 00:00 Beijing Time on May 27. It applies to the two core versions, MiMo-V2.5 and MiMo-V2.5Pro, and no longer differentiates based on context window length, simplifying the pricing structure for greater transparency.
Model VersionInput Cache Hit PriceMaximum DiscountOutput PriceMaximum DiscountMiMo-V2.5Pro0.025 yuan per million tokens, up to 99% off; output: 6 yuan per million tokens, up to 86% offMiMo-V2.50.02 yuan per million tokens, up to 98% off; output: 2 yuan per million tokens, up to 93% offII. Billing System Upgrade — More Value at No Extra Cost
Beyond the direct API price cuts, Xiaomi has heavily optimized its Token Plan billing system:
Quadrupled Quota: Under the original pricing, the actual token usage quota has increased to 5 to 8 times the previous amount.
Simplified Rules: The introduction of Credits replaces the previous complex billing methods, making token consumption and cost calculation more intuitive for developers.

III. Technical Foundation — How Can It Keep Lowering Prices?
Xiaomi's official statement attributes these deep price cuts to technical breakthroughs in its underlying inference system architecture:
SWA Inference Optimization: By leveraging SGLang HiCache with full support for SWA (Sliding Window Attention Mechanism), the data transfer among GPU memory, CPU memory, and SSD has been reduced to one-seventh of the previous volume.
Improved Cache Efficiency: The number of cacheable tokens has increased nearly fivefold compared to the earlier optimized version, boosting cache hit rates and dramatically lowering per-inference cost.
Cluster Throughput Optimization: With the introduction of expert parallel (MoE) and input length bucketing strategies, the cluster's input throughput has seen a qualitative leap, maintaining high service quality while steadily reducing cost per token.
Xiaomi's move is seen as a proactive response to the current intense competition in large model commercialization. As price barriers continue to drop, the MiMo series' cost-effectiveness will become even more pronounced, accelerating the deep integration of AI capabilities across vertical industries and developer workflows.
Related article
AI Glasses Supply Chain Pursues Light and Chips as Horizon Technology Invests Heavily Ahead of iPhone Era
By the second quarter of 2026, the AI glasses market is heating up rapidly, with the industry shifting from the early "hundred-glasses race" toward a more refined and specialized phase. Google announced its first AI glasses launching this fall, and m
Mind Robotics, a Rivian spin-out, lands $500M for industrial AI robots
Mind Robotics, an industrial robotics lab that emerged from electric vehicle maker Rivian, has secured $500 million in a Series A funding round co-led by venture capital firms Accel and Andreessen Horowitz.Announced Wednesday, the financing follows a
Amazon's Generative AI Assistant Alexa+ Launches in Germany, Prime Members Get Free Perks
On May 8, Amazon officially launched its next-generation generative AI assistant, Alexa+, in Germany, marking another key step in the company's global AI strategy. The service had already been rolled out in several countries and regions, including th
Related Special Topic Recommendations
Comments (0)
0/500
Amid the escalating AI model price wars, Xiaomi officially announced on May 27 that its MiMo large model would permanently reduce prices for the MiMo-V2.5 series API while simultaneously optimizing the billing system to further lower developers' calling costs through technological advancements.

I. Significant API Price Cuts — Up to 99% Off
The price change took effect globally at 00:00 Beijing Time on May 27. It applies to the two core versions, MiMo-V2.5 and MiMo-V2.5Pro, and no longer differentiates based on context window length, simplifying the pricing structure for greater transparency.
Model VersionInput Cache Hit PriceMaximum DiscountOutput PriceMaximum DiscountMiMo-V2.5Pro0.025 yuan per million tokens, up to 99% off; output: 6 yuan per million tokens, up to 86% offMiMo-V2.50.02 yuan per million tokens, up to 98% off; output: 2 yuan per million tokens, up to 93% offII. Billing System Upgrade — More Value at No Extra Cost
Beyond the direct API price cuts, Xiaomi has heavily optimized its Token Plan billing system:
Quadrupled Quota: Under the original pricing, the actual token usage quota has increased to 5 to 8 times the previous amount.
Simplified Rules: The introduction of Credits replaces the previous complex billing methods, making token consumption and cost calculation more intuitive for developers.

III. Technical Foundation — How Can It Keep Lowering Prices?
Xiaomi's official statement attributes these deep price cuts to technical breakthroughs in its underlying inference system architecture:
SWA Inference Optimization: By leveraging SGLang HiCache with full support for SWA (Sliding Window Attention Mechanism), the data transfer among GPU memory, CPU memory, and SSD has been reduced to one-seventh of the previous volume.
Improved Cache Efficiency: The number of cacheable tokens has increased nearly fivefold compared to the earlier optimized version, boosting cache hit rates and dramatically lowering per-inference cost.
Cluster Throughput Optimization: With the introduction of expert parallel (MoE) and input length bucketing strategies, the cluster's input throughput has seen a qualitative leap, maintaining high service quality while steadily reducing cost per token.
Xiaomi's move is seen as a proactive response to the current intense competition in large model commercialization. As price barriers continue to drop, the MiMo series' cost-effectiveness will become even more pronounced, accelerating the deep integration of AI capabilities across vertical industries and developer workflows.
AI Glasses Supply Chain Pursues Light and Chips as Horizon Technology Invests Heavily Ahead of iPhone Era
By the second quarter of 2026, the AI glasses market is heating up rapidly, with the industry shifting from the early "hundred-glasses race" toward a more refined and specialized phase. Google announced its first AI glasses launching this fall, and m
Mind Robotics, a Rivian spin-out, lands $500M for industrial AI robots
Mind Robotics, an industrial robotics lab that emerged from electric vehicle maker Rivian, has secured $500 million in a Series A funding round co-led by venture capital firms Accel and Andreessen Horowitz.Announced Wednesday, the financing follows a
Amazon's Generative AI Assistant Alexa+ Launches in Germany, Prime Members Get Free Perks
On May 8, Amazon officially launched its next-generation generative AI assistant, Alexa+, in Germany, marking another key step in the company's global AI strategy. The service had already been rolled out in several countries and regions, including th





Home






