Chinese AI company DeepSeek has released version 3.1 of its flagship large language model, expanding the context window to 128,000 tokens and increasing the parameter count to 685 billion. The update was announced quietly through the company’s WeChat user group on August 19, without any posts on its official social media channels.
What’s New?
The most significant change in DeepSeek V3.1 is the increased context length, which now allows the model to handle inputs equivalent to a 300 to 400-page book. This improvement enables better performance in long-form content generation, technical document analysis, and extended multi-turn conversations. The official group confirmed the expanded context was already supported internally in the previous V3 version, but has now been formally enabled across all interfaces.

Mixture-of-Experts and Benchmark Scores
DeepSeek V3.1 continues to use a Mixture-of-Experts (MoE) architecture, with only 37 billion parameters activated per token. The model supports multiple precision formats, including BF16, FP8, and F32, providing more flexibility for different deployment environments. Developers can access the model via API or download it from Hugging Face under the MIT open-source license.
The upgraded model performed well in early third-party benchmarks. It scored 71.6% on the Aider coding test, placing it above Claude Opus 4 and making it one of the strongest open-source coding models currently available. DeepSeek V3.1 also showed improved performance in math and logic tasks. However, some users noted no clear gains in reasoning compared to the earlier R1-0528 model.
Shift in Strategy
DeepSeek has removed all references to the R1 model from its chatbot interface, signaling a shift toward a single hybrid model architecture. The company appears to have integrated its reasoning capabilities into V3.1, instead of maintaining a separate reasoning model.
The training cost for V3.1 has not been disclosed. However, according to previous reports, the original V3 model was trained on 2.788 million GPU hours using Nvidia H800 chips, at an estimated cost of $5.6 million. That model formed the base for the current version, which likely shares similar infrastructure with additional refinements.
Confusion Around the Delayed R2 Model
There had been widespread anticipation that DeepSeek’s next major release would be the long-awaited R2 model, designed to advance reasoning capabilities. Instead, V3.1 appeared as the company’s next step. According to a recent Financial Times report, the R2 model’s release has been delayed due to persistent technical issues involving Huawei’s Ascend AI chips.
DeepSeek was reportedly urged to use Ascend hardware to reduce reliance on Nvidia, aligning with China’s national strategy for AI self-sufficiency. Despite support from Huawei engineers, training on Ascend failed due to compatibility and performance issues. The company then switched to using Nvidia GPUs for training while retaining Ascend for inference. This hybrid setup introduced further complications and delays. In addition, the extended time spent on data labeling slowed down development. DeepSeek founder Liang Wenfeng reportedly expressed frustration with the slow progress.
Meanwhile, competitors such as Alibaba’s Qwen3 have moved ahead by deploying similar algorithms with more efficient execution. The episode has underscored the limitations of China’s domestic chip infrastructure and the challenges faced by startups attempting to meet political and technical demands simultaneously.
DeepSeek has not ruled out the launch of R2. However, whenever the model does arrive, its performance will face intense scrutiny. Until then, V3.1 stands as the company’s current flagship, serving both reasoning and non-reasoning workloads in a unified framework.
For more daily updates, please visit our News Section.
Stay ahead in tech! Join our Telegram community and sign up for our daily newsletter of top stories! 💡







Comments