
A research team that includes Huawei Technologies says it has successfully used the firm’s Ascend 910C chips to complete post-training for the DeepSeek-V4-Pro model, marking a major step forward as China’s semiconductor industry tries to leap from supporting basic AI inference to more complex model training amid tightening US sanctions.
While Chinese chipmakers have found success in supporting AI inference – the relatively simple process of running an already-finished model to answer user prompts – they have struggled with training, the far more complex process of building or refining a model’s brain.
If initial “pre-training” teaches a model how to speak by absorbing massive amounts of data, post-training teaches it how to work by following human instructions, safety rules and specific tasks.
To achieve this, the researchers ran DeepSeek’s largest model to date – boasting 1.6 trillion parameters – on a computing cluster powered by at least 1,000 Huawei chips, according to a social media post from the Shenzhen government on Friday.
The team successfully conducted “full-parameter” post-training, meaning the model’s entire architecture was updated and refined without cutting corners, the post said.
Previously, domestic computing power was primarily used for inference, “much like building a one-way road for the model: input a question, output an answer”, the post explained. The project, however, allowed a model to self-reflect and adjust.
This added “complex flyovers and loops to that one-way road, instantly multiplying the computational and communication demands by several times”, it added.
The exploration – jointly conducted by Huawei, the Shenzhen Loop Area Institute, the Shenzhen campus of Harbin Institute of Technology and Shenzhen Research Institute of Big Data – “will help enhance the self-reliance of China’s AI industry chain”, the post said.
View original source — South China Morning Post ↗