ByteDance says new AI technology boosts model training efficiency by 1.7 times


TikTok owner ByteDance said it has achieved a 1.71 times efficiency improvement in large language model (LLM) training, the latest Chinese tech company to achieve a breakthrough that could potentially reduce demand for Nvidia’s high-end graphics processing units (GPUs).

The company’s Doubao development team said they managed to “speed up” LLM training efficiency by “1.71 times” through COMET, an optimised Mixture-of-Experts (MoE) system, according to a recent paper published on arXiv, an online forum for professionals in the scientific community.

MoE is a machine learning technique where multiple expert networks are used to divide a problem space into homogeneous sections. The technique has been extensively adopted to scale LLMs to trillion-plus parameters, while maintaining fixed computing cost. It is widely adopted by leading artificial intelligence (AI) models such as Grok and DeepSeek.

The headquarters of ByteDance is seen in Beijing on September 16, 2020. Photo: AFP
The headquarters of ByteDance is seen in Beijing on September 16, 2020. Photo: AFP

The new system has already been adopted in the company’s production environment of clusters using over 10,000 GPUs, achieving “savings of millions of GPU hours”, according to the Doubao team.

Breakthroughs in reducing the training cost for AI models could lead to lower demand for chips from Nvidia, whose high-performance GPUs are subject to strict export controls by the US.

The rise of Hangzhou-based DeepSeek, which developed and trained its AI models at a fraction of the cost and with fewer computing resources than its Western counterparts, had led to speculation that demand for Nvidia GPUs could slow. After the DeepSeek breakthrough, Nvidia experienced a nearly US$600 billion drop in market value last month, in the largest single-day drop for any US company, before it rebounded the next day.

Although widely adopted by major tech companies, the MoE technique leads to so-called “communication-computation overlapping”, which “introduces a notable impairment of computational efficiency”, according to the ByteDance scientists. The new system is aimed at addressing this problem by “eliminating fine-grained communication bottlenecks and enhancing its adaptability across various scenarios”, they wrote.



Source link

Scroll to Top