Get AI summaries of any video or article — Sign up free
DeepSeek stole our tech... says OpenAI thumbnail

DeepSeek stole our tech... says OpenAI

Fireship·
5 min read

Based on Fireship's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

OpenAI and Microsoft are reportedly accusing DeepSeek of IP theft tied to distillation using OpenAI API outputs, allegedly violating DeepSeek’s terms.

Briefing

OpenAI and Microsoft are reportedly accusing DeepSeek of intellectual-property theft, specifically alleging that DeepSeek used “distillation” techniques to fine-tune models using outputs from OpenAI—an approach they say violates DeepSeek’s terms of service. The dispute matters because it strikes at the boundary between legitimate model training and copying: distillation can transfer capabilities from a large, expensive model to a smaller one, but the allegation is that DeepSeek did it using OpenAI’s API outputs at scale.

The claims come amid a broader shockwave from DeepSeek’s rapid rise. A Chinese hedge fund is described as having built a state-of-the-art reasoning model that reportedly surpassed OpenAI’s model while spending only $5.5 million, then offering a 100% discount code that undercut Big Tech’s pricing power. That narrative frames the current controversy as more than a technical dispute—it’s also a fight over whether AI progress is being throttled by expensive infrastructure and whether open, low-cost training can break the dominance of companies pushing massive data-center spending.

So far, hard evidence has not been publicly detailed, but screenshots circulating online allegedly show DeepSeek producing responses that look indistinguishable from ChatGPT. Critics argue that this isn’t automatically proof, since similar content can be learned organically from widely available text. Microsoft, however, is said to have observed activity in China involving large-volume extraction of data from the OpenAI API, with accounts potentially linked to DeepSeek. In this telling, DeepSeek becomes “Robin Hood” for some—stealing from the rich to empower cheaper, more accessible models—while OpenAI frames it as unauthorized appropriation.

Distillation itself is not inherently controversial in the transcript’s framing. Models can be distilled from other open models like Llama and Qwen, and even OpenAI models can be distilled in principle as long as the process doesn’t rely on using the API to build a rival model. The core accusation, then, is not the concept of distillation but the alleged source and method.

The controversy unfolds alongside a fast-moving China-vs-China model race. New releases are cited, including Alibaba’s Qwen 2.5 Max and another model, Kim 1.5, both described as outperforming major Western competitors on benchmarks. Meanwhile, DeepSeek is criticized for heavy censorship but also noted as relatively jailbreakable for skilled prompt engineers. A separate technical talking point highlights DeepSeek’s claimed 10x efficiency gains by avoiding CUDA and instead using Nvidia parallel thread execution directly.

The transcript closes by emphasizing a larger trend: open source is gaining ground, and developers should build products on top of it. It also promotes PostHog as an open-source, self-hostable analytics and experimentation tool, positioning it as a practical way to ship better features as the AI landscape accelerates.

Cornell Notes

OpenAI and Microsoft are reportedly accusing DeepSeek of IP theft tied to “distillation,” alleging DeepSeek used OpenAI API outputs to fine-tune models in a way that violates terms. Distillation can legitimately transfer knowledge from one model to another, but the dispute centers on the alleged source and scale of extracted outputs. Public proof is described as limited—screenshots circulate, but similar outputs can appear from organic training data—while Microsoft is said to have observed large-volume API extraction activity linked to accounts in China. The allegations land as DeepSeek’s efficiency and low training cost fuel a broader open-model race, with new Chinese releases like Alibaba’s Qwen 2.5 Max and Kim 1.5 adding pressure on Western leaders. The stakes are both technical (how models are trained) and economic (whether open, cheaper systems can undercut incumbents).

What does “distillation” mean in this dispute, and why is it central to the accusation?

Distillation is a training approach where a smaller model learns from the outputs of a larger, more expensive model. In the transcript’s framing, OpenAI and Microsoft object to using OpenAI’s API outputs as the knowledge source to build a rival model. Distillation is portrayed as generally acceptable when it uses open models (e.g., distilling from Llama or Qwen), but the alleged violation is tied to extracting large volumes from OpenAI’s API and then fine-tuning based on those outputs.

Why aren’t screenshots alone considered decisive evidence?

Screenshots are described as “not a smoking gun” because similar responses can be found widely online and learned organically during training. If a model has been trained on large public corpora, it may reproduce common ChatGPT-like phrasing without needing direct access to OpenAI’s outputs. That’s why the transcript contrasts screenshots with the more serious claim: observed large-volume API extraction activity.

What specific behavior does Microsoft allegedly observe that links the activity to DeepSeek?

Microsoft is said to have observed someone in China extracting large volumes of data from the OpenAI API. The transcript adds that these accounts may be linked to DeepSeek. The implication is that the extracted API outputs could have served as the raw material for distillation-based fine-tuning.

How does the transcript portray DeepSeek’s efficiency and hardware approach?

DeepSeek is credited with achieving “10x better efficiency” partly by not using CUDA, Nvidia’s proprietary GPU platform. Instead, it reportedly uses Nvidia parallel thread execution directly—compared conceptually to writing a website with assembly-level control. The point is that DeepSeek’s engineering choices may reduce compute cost and improve throughput, helping it compete despite low reported training spending.

What other criticisms and constraints are mentioned beyond IP theft?

The transcript notes complaints that DeepSeek is highly censored, while also claiming it can be “relatively easy to jailbreak” for senior prompt engineers. It also raises a privacy concern: using DeepSeek on the web supposedly sends prompts and keystrokes to China, with the advice that users should run it locally if they care about privacy.

How do new model releases change the stakes of the OpenAI–DeepSeek dispute?

The transcript situates the controversy inside a fast-moving China-vs-China race. It cites Alibaba’s Qwen 2.5 Max and another model, Kim 1.5, both described as outperforming major Western competitors on benchmarks. That context suggests the market impact isn’t just legal—it’s performance and cost, where new open models can quickly shift developer and user attention.

Review Questions

  1. What distinction does the transcript draw between legitimate distillation and the alleged misuse of OpenAI API outputs?
  2. Why might a model produce “ChatGPT-like” responses without any direct copying from OpenAI?
  3. How do DeepSeek’s claimed efficiency gains (avoiding CUDA) relate to its broader competitive narrative?

Key Points

  1. 1

    OpenAI and Microsoft are reportedly accusing DeepSeek of IP theft tied to distillation using OpenAI API outputs, allegedly violating DeepSeek’s terms.

  2. 2

    Screenshots of overlapping responses are treated as weak evidence because similar outputs can emerge from organic training on widely available text.

  3. 3

    Microsoft is said to have observed large-volume extraction from the OpenAI API by accounts in China, potentially linked to DeepSeek.

  4. 4

    Distillation is portrayed as acceptable when it transfers knowledge from open models like Llama and Qwen, but disputed when it relies on OpenAI API outputs to build a rival.

  5. 5

    DeepSeek’s rise is framed around low training cost, aggressive pricing, and claimed 10x efficiency gains through avoiding CUDA.

  6. 6

    The transcript highlights additional controversies: heavy censorship, jailbreakability, and privacy concerns when using the web version.

  7. 7

    Open-source model momentum is emphasized, with new releases like Alibaba’s Qwen 2.5 Max and Kim 1.5 intensifying competitive pressure.

Highlights

OpenAI–Microsoft’s core complaint centers on distillation allegedly powered by large-scale OpenAI API extraction, not merely on generic model similarity.
Screenshots aren’t treated as proof because common phrasing can be learned from public data; the stronger claim is observed API scraping activity.
DeepSeek’s efficiency narrative includes avoiding CUDA and using Nvidia parallel thread execution directly, aiming to cut compute costs.
The dispute unfolds during a rapid China-led model release cycle, with Qwen 2.5 Max and Kim 1.5 cited as benchmark threats.