Advertisement

Xiaomi is continuing its steady push into large language models. After introducing MiMo-7B in May 2025 and following it up with MiMo-V2-Flash in December, Xiaomi has now announced three new models: MiMo-V2-Pro, MiMo-V2-Omni, and MiMo-V2-TTS.

All three are already being integrated across Xiaomi’s own ecosystem, including MiMo Studio, Xiaomi Browser, and Kingsoft Office, while also being accessible through developer tools like OpenClaw, OpenCode, and Cline. There’s also a one-week free trial for developers.

Xiaomi MiMo-V2-Pro

The headline model here is MiMo-V2-Pro, which Xiaomi positions as its flagship for what it calls the “agent era.” It’s built for heavy, real-world workloads, with more than 1TB of total parameters and a 1MB context window. 

Xiaomi claims MiMo-V2-Pro can handle complex tasks like workflow orchestration and long-term planning without human input, especially within agent frameworks. The company also says its performance is close to models like Claude Opus 4.6, while costing significantly less to use via API. Pricing starts at $1 per million tokens for input within smaller contexts, scaling up for larger ones.

The model is integrated into Kingsoft’s WPS Office tools, where it can work across Word, Excel, PowerPoint, and PDFs. 

Xiaomi MiMo-V2-Omni and MiMo-V2-TTS

MiMo-V2-Omni takes a different approach to focus on multimodal tasks. It’s designed to process audio, images, and video together, with Xiaomi claiming strong performance in areas like audio understanding and visual reasoning. 

The model can handle long audio inputs, multi-speaker scenarios, and combined audio-video analysis, which points to broader use cases beyond text. Xiaomi says its audio understanding even surpasses models like Gemini 3 Pro in some cases.

Meanwhile, MiMo-V2-TTS is Xiaomi’s speech synthesis model. Here, you can adjust tone, emotion, and speaking style at a detailed level. Xiaomi says it can handle everything from natural conversation to singing, with support for multiple Chinese dialects.

All three models are now available via Xiaomi’s API platform, with relatively aggressive pricing. Most large tech companies are building similar stacks of text, multimodal, and voice models. What stands out is how quickly Xiaomi is iterating and how tightly these models are being woven into its existing software ecosystem.

For more daily updates, please visit our News Section.

Stay ahead in tech! Join our Telegram community and sign up for our daily newsletter of top stories! 💡

Comments