Huawei has officially launched its new AI inference framework, Unified Cache Manager (UCM), following earlier reports about the company’s plans to reduce reliance on high-bandwidth memory (HBM) chips.
As anticipated, UCM introduces a memory management solution aimed at accelerating large model inference. Huawei claims the framework improves system throughput and latency by managing KV Cache data more efficiently across different memory tiers.

Huawei targets AI bottlenecks with UCM
UCM is built around a simple but effective goal: make large AI models run faster without needing premium memory hardware. It introduces a hierarchical memory management structure for KV Cache, a key component used during inference. By segmenting memory use across HBM, standard DRAM, and SSDs, UCM allocates data based on real-time latency needs.
Huawei tested UCM in real-world business applications at China UnionPay, including customer voice analysis, marketing planning, and office assistance. The company reported up to a 90% reduction in latency and a 22-fold increase in throughput. These performance gains demonstrate how software-level optimizations can help bypass hardware limitations.
Context: China’s limited access to HBM
HBM is a critical resource for running large AI models. It delivers high bandwidth and low latency for GPUs, allowing fast retrieval of massive parameter sets. However, global supply is dominated by SK Hynix, Samsung Electronics, and Micron Technology. China currently faces restrictions on HBM exports from the United States and its allies, limiting access to newer versions such as HBM3 and HBM4.
Huawei’s UCM arrives as a timely workaround. Instead of relying entirely on HBM, the software enables more flexible use of available memory. It supports inference at scale using conventional components, which could be a major benefit for AI deployments in China amid ongoing tech sanctions.
Open-source roadmap and ecosystem push
Huawei plans to open source UCM in September 2025. The first release will appear on the company’s MindSpore community platform. Huawei will later contribute the toolkit to mainstream inference engines and share it with ecosystem partners, including storage vendors aligned with its “Share Everything” architecture.
This mirrors Huawei’s broader strategy around its Ascend AI hardware. The company previously announced plans to open source its Compute Architecture for Neural Networks (CANN), an alternative to Nvidia’s CUDA, aimed at developers working on Ascend chips.
Software-led response to hardware restrictions
UCM represents a wider trend in China’s AI sector, where software solutions are stepping in to counter limited hardware access. Startups like DeepSeek have already made progress using fewer chips by refining memory usage and optimizing model deployment. Huawei’s UCM continues this effort at the infrastructure level.
Zhou Yuefeng, Huawei’s vice president of storage product lines, confirmed that UCM’s architecture can be deployed across different levels of storage, adapting to memory availability without impacting performance. This flexible structure allows for smoother operation in data centers, where memory demand fluctuates depending on workload.
US-China chip tensions and the shift to domestic solutions
The development comes amid ongoing tensions between China and the United States over semiconductor access. The US government has tightened export restrictions on advanced AI chips and high-performance memory. China has responded by investing in its domestic semiconductor industry, but local players are still catching up. Companies like Yangtze Memory and Changxin Memory Technologies are working on HBM2 chips, while international rivals are already shipping HBM4.
Huawei remains at the center of this shift. The company has introduced its CloudMatrix 384 AI system and Ascend processor series as alternatives to Nvidia’s offerings. Chinese regulators have reportedly urged local firms to reduce reliance on Nvidia and AMD for sensitive or government-linked applications.
So yeah, UCM is part of this broader strategy. By improving memory efficiency through software, Huawei is building a more self-reliant AI infrastructure that minimizes dependence on restricted hardware. As the AI race increasingly revolves around memory bandwidth and data flow, tools like UCM could help define how China deploys large-scale AI models in the years ahead.
In related news, Nvidia and AMD have reportedly agreed to pay 15% of their China chip revenue to the US government.
For more daily updates, please visit our News Section.
Stay ahead in tech! Join our Telegram community and sign up for our daily newsletter of top stories! 💡
(Source)







Comments