01/30/2025

AI

Defending American AI Leadership: DeepSeek is a Distraction, Not a Disruptor

Over the last week, DeepSeek has grabbed the world’s attention—and for good reason. They achieved something remarkable, despite the constraints of export controls, managing to use creative engineering solutions to train a powerful AI model for just $6 million (for the final training run – more on this later). This is exactly why Anthropic founder Dario Amodei posted a call for tighter export controls. It’s an impressive feat—one that deserves recognition and will undoubtedly inspire further innovation across the broader ecosystem.

At the same time, it’s important to put DeepSeek’s achievement in perspective: it represents a savvy iteration on the first chapter of AI development and scaling rather than the beginning of a new chapter.

 

A Changing of the Guard: Post-Training over Pre-Training

The previous era was defined by both the scaling and optimization of pre-training. Early progress hinged on massive capital investments in compute and data during pre-training to drive performance gains. Leading labs—Anthropic, Google DeepMind, OpenAI—were focused on creating the foundational recipe for model training. DeepSeek’s success builds directly on those investments and wouldn’t have been possible without them.

Moreover, the focus on the ~$6M *final* training run, even if taken at face value, is misleading since it only includes the cost of the final DeepSeek v3 training run. We suspect the “fully loaded” cost (training the base model, data collection and prep, SFT, RL, various research experiments, etc.) is comparable to the costs of models like Claude 3.5 Sonnet or OpenAI GPT4o.

Algorithmically, DeepSeek focused their post-training regimen on highly scaled RL (reinforcement learning) to produce emergent reasoning abilities. This approach is also well known to US labs who have been working with it for some time now. Perhaps more importantly, US Labs, because of their significant resources, have been focused on researching *multiple* new RL post-training regimens simultaneously, and this, combined with their superior compute and infrastructure resources, is going to be what really unlocks future acceleration in the march towards super-intelligence. DeepSeek certainly delivered an impressive model in terms of its current size and efficiency and competence, but it still hasn’t fully caught up to the highest-performing foundation models. Anthropic’s Sonnet 3.5, for instance, remains well ahead of R1 in its reasoning ability for real-world coding use cases.

The end goal is what really matters. If you are aiming for the moon, building a low-orbit satellite is a distraction. The large AI labs have the singular goal of developing AGI – the real moonshot. We believe that they uniquely have the talent, capital, and capabilities to win.

 

Getting to the Next Frontier Will Require Large Amounts of Capital…But it Will be Worth it

The leading AI labs have for months been focused squarely on the new RL-based post training regime, and progress is rapid. Today, models like DeepSeek’s R1 will compare favorably on paper because, to their credit, they moved quickly. But more importantly, scaling laws for this new paradigm over the long term will require exponentially more resources in compute and data: resources enjoyed by few companies. The encouraging news is we already know that scaling this approach massively improves the capabilities of the next generation of models, expanding potential use cases in turn, and again widening the current gap between established labs and China’s capabilities.

 

Enterprises Need More Than a Model

Models are not products, and enterprises building on foundation models need more than just intelligence through an API. As with prior technology waves, they’ll need a set of infrastructure building blocks to support robust development, testing, deployment, monitoring, and governance of their AI applications at scale. Enterprises must care deeply about security, compliance, liability, safety, interpretability, and much more. It is hard to imagine that US companies would be willing to leverage models of questionable origin. Companies such as Anthropic are focused on ensuring they build enterprise-ready products – that’s very different from developing a model that could have security problems, bias, censorship, toxicity, and other issues that haven’t been fully addressed.

 

How We Win

There is no question that the US’ leadership in AI has caught the attention of other nations. There are legitimate questions amongst the research community about the potential role of unauthorized distillation (i.e. employing methods to extract and copy training data without tipping off the model provider or the model itself).

We’re observing an emerging consensus that DeepSeek’s models were likely trained on Chain-of-Thought outputs from other leading frontier models (against their terms of service). Microsoft and OpenAI are now formally investigating the possible unauthorized exfiltration of large quantities of training data by DeepSeek. Earlier stories echoing this suspicion were reported both by TechCrunch and again in this report by JPMC yesterday.

The time has come for US frontier model companies to urgently prioritize research and development for new AI security measures. This is why OpenAI’s o1 reasoning tokens are shielded from end users. By now, it’s likely that jailbreaks have been discovered to circumvent these rudimentary measures.

Export controls on high-end chips have been somewhat effective at slowing China’s AI development but these controls must be tightened further and robustly enforced. As of today, China is actively working to bypass these restrictions, using shell companies to access advanced chips through third-party countries. This will continue and the new administration needs to be prepared to take further action.

The US *is* where the brightest minds have assembled to drive this generational shift forward. For AI to be safe, secure and trusted, we need AI to be built in the US and for the US and its allies.

This is a race we can win – now we need to do the work to ensure that we do.

Lightspeed Possibility grows the deeper you go. Serving bold builders of the future.