In the ever-evolving landscape of artificial intelligence, the past week has ushered in a whirlwind of discussions centered around a breakthrough in China’s AI capabilities, specifically a monumental leap made by a startup called DeepSeekThis newfound attention seems to have caught Silicon Valley, and indeed the global tech community, somewhat off guard.
What sparked this fervor was the unveiling of DeepSeek-R1, a reasoning model that has drawn comparisons to established contenders such as OpenAI's modelIt's not just its performance, which rivals that of its Western counterparts, that has the experts buzzing, but also its remarkably low service costs and, perhaps most strikingly, the fact that both its code and model architecture are entirely open-source.
Several notable figures in the tech sphere have expressed their astonishment at DeepSeek's swift riseAlexander Wang, the founder of Scale AI, remarked that while the U.S
Advertisements
may have led the AI race for the past decade, the emergence of DeepSeek's large language model could “change everything.” This sentiment was echoed by Ian Stoica, a professor at UC Berkeley, who noted that DeepSeek's models demonstrate cutting-edge results at a fraction of the cost needed to train existing models like GPT, Gemini, and ClaudeRemarkably, within the university's large model ranking system, DeepSeek-R1 now ranks third among all models—both proprietary and open-source.
But what has contributed to the “shock” experienced in Silicon Valley regarding DeepSeek? For years, the development of global AI large models has become ensnared in an arms race—fueled by scaling laws indicating that larger computing capacities and vast amounts of training data yield increasingly intelligent modelsMajor tech companies have been hoarding chips to ensure they possess ample computational power
Advertisements
Recent reports from Omdia detail how Microsoft has emerged as the largest buyer of Nvidia’s flagship Hopper chips, acquiring 485,000 units, representing 20% of Nvidia’s revenue in the past yearMeta has positioned itself in second place and has purchased 224,000 GPUs, while Amazon and Google are projected to procure 196,000 and 169,000 Hopper chips, respectively.
However, as the reliance on merely piling up data and computational resources started to raise eyebrows among scientists, DeepSeek, a subsidiary of the quantitative trading firm Huanshifeng, burst onto the scene, seemingly propelling forward with surprising agilityTo train its models, DeepSeek secured over 10,000 Nvidia GPUs before the imposition of export restrictions from the U.S., although anecdotal reports suggest they may possess around 50,000 H100 chips—an assertion not yet validated by the company.
Last year, the team launched DeepSeek-V3, which exhibited exceptional cost-performance through optimizations in model architecture and infrastructure
Advertisements
The technical report revealed that the complete training for DeepSeek-V3 required only 2.788 million H800 GPU hours, translating into a modest training cost of roughly $5.57 millionThis model's performance stands shoulder to shoulder with market behemoths like GPT-4o and Claude Sonnet 3.5, which hail from American AI giantsNotably, AI scientist Andrej Karpathy commented that such capabilities usually necessitate nearly 16,000 GPUs for training, while current market clusters often exceed 100,000 GPUs.
While concrete figures regarding the training costs of the newly launched DeepSeek-R1 remain undisclosed, its pricing model is where it truly shinesFor API services, it charges a mere 1 yuan per million input tokens when in cache and 4 yuan otherwise—significantly lower when compared to OpenAI’s pricing of 2% and 3.6% respectively.
Beyond its exceptional cost-performance advantages, the complete open-source nature of DeepSeek’s models sets it apart in a way that few can match
- US Stocks Continue to Rise in January
- Nasdaq Surges Over 2%
- US-Canada Trade Spat: Will Oil Prices Surge?
- Bank of Japan Resumes Interest Rate Hikes
- FOF Asset Allocation: Strategies for Steady Progress
The concept of open sourcing entails making the source code and technical details public, allowing users to utilize and modify the model based on their needsThis approach enhances technological transparency, makes it easier and more cost-effective for users, and serves to empower developers while averting monopolistic practicesConversely, closed-source models are entirely controlled by their vendors, resulting in restricted access and no room for user modifications.
DeepSeek-R1 has quickly risen to become the most downloaded large model on the open-source platform Hugging Face, accumulating 109,000 downloadsThis figure suggests that developers worldwide are eager to understand this model to augment their AI projectsSuch a surge in interest was palpable when DeepSeek’s servers experienced temporary fluctuations in service on the 26th, likely due to the influx of users drawn in by the new model’s release.
The releases of DeepSeek-V3 and DeepSeek-R1 have provided a boon to academic researchers, offering greater transparency through the disclosure of technical details
This accessibility allows academic circles to identify potential optimizations in the tech stack and propose novel challenges to exploreDeepSeek’s founder, Liang Wenfeng, articulated this sentiment in a recent interview, asserting that in the face of disruptive technology, the fortress created by closed-source models is ephemeral, stating that even if OpenAI were to close its models, it wouldn’t prevent competition from catching up.
Liang believes that “open source is more of a cultural act than a business oneBy sharing, we grant an extra honorSuch actions by a company can also foster cultural allure.” According to a white paper released by the China Academy of Information and Communications Technology last year, the global count of AI large language models has soared to 1,328, with 36% coming from ChinaAs a result, China has emerged as the second-largest contributor to AI technology, trailing only the United States
Alibaba Cloud has also announced over 100 new open-source AI models supporting 29 languages, addressing diverse application needs from coding to mathematicsLikewise, Chinese startups like Minimax and 01.AI have released their models as open source.
Meta’s chief AI scientist Yang Li-Kun weighed in on the implications of DeepSeek’s success, asserting that the greatest takeaway isn’t an intensifying threat from Chinese competitors, but rather the emphasis on maintaining the open-source value of AI models, enabling universal benefit“They’ve crafted new ideas and built on others’ workGiven that their findings are published and open-sourced, everyone stands to gain,” Yang noted, highlighting the transformative power of open research and open source.
As the tech community digests this development, feedback suggests that DeepSeek’s progress has resonated so strongly that it has instigated a sense of urgency within Meta’s generative AI team
Following this, Meta's chief executive Mark Zuckerberg declared an acceleration in the development of Llama 4, backed by plans to invest $65 billion into expanding data centers and deploying 1.3 million GPUs—an endeavor aimed at ensuring Meta AI remains the leading model by 2025.
Yet, experts caution that a keen focus is still warranted on China’s capacity to achieve “0 to 1” innovations in AIFor example, according to DeepSeek's technical report, the formal training cost of DeepSeek-V3 hovers around $5.58 million, but this figure omits the costs associated with precursor research on architecture, algorithms, and data, alongside experimental trialsFor large model training, while ineffective pathways and explorations may squander substantial computational resources, achieving breakthroughs without incursions of this “waste” seems unlikely.
Currently, while it appears that DeepSeek's large models are accelerating innovation, this innovation primarily showcases efficacy in replication tasks between stages one through ten, waiting to see how transformative it might be in further groundbreaking projects.