Categories
News

Chinese AI company says breakthroughs enabled creating a leading-edge AI model with 11X less compute — DeepSeek’s optimizations highlight limits of US sanctions


DeepSeek, a Chinese AI startup, says it has educated an AI model similar to the main fashions from heavyweights like OpenAI, Meta, and Anthropic, however at an 11X discount within the quantity of GPU computing, and thus price. The startling announcement means that whereas US sanctions have impacted the supply of AI {hardware} in China, intelligent scientists are working to extract the utmost efficiency from restricted quantities of {hardware}. These varieties of advances might finally cut back the influence of choking off China’s provide of AI chips.

Deepseek educated its DeepSeek-V3 Combination-of-Specialists (MoE) language model with 671 billion parameters utilizing a cluster containing 2,048 Nvidia H800 GPUs in simply two months, which implies 2.8 million GPU hours, in response to its paper. For comparability, it took Meta 11 occasions extra compute energy (30.8 million GPU hours) to coach its Llama 3 with 405 billion parameters utilizing a cluster containing 16,384 H100 GPUs over the course of 54 days.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *