OpenAI Is Actually Terrible
Based on The PrimeTime's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
OpenAI’s distillation-related complaints are portrayed as hypocritical when set against allegations that OpenAI trained on data and code with potentially incompatible licenses.
Briefing
OpenAI’s public complaints about DeepSeek R1 are framed as hypocrisy: the same company that relies on large-scale training data and model distillation is portrayed as arguing that others’ use of similar techniques violates rules. The transcript points to OpenAI’s Terms of Service—specifically claims that commercial “distillation” is not allowed—then contrasts that with alleged past behavior: training on GPL-licensed code, scraping GitHub in ways that could break licenses, using images with unclear permissions, and drawing from Twitter data that may not extend to third-party training. The core claim is that OpenAI’s stance is “terrible” not because copyright law is settled, but because it targets competitors while benefiting from the very ecosystem of data and reuse it criticizes.
That tension is then broadened into a policy argument about copyright and AI outputs. A U.S. Copyright Office report is cited as saying existing copyright principles can flex to cover generative AI, with protection for AI outputs only when a human author contributes “sufficient expressive elements.” The transcript treats this as internally inconsistent: if generative systems can ingest and remix copyrighted material, then later producing something that can be copyrighted seems to undermine the logic of ownership. The discussion leans on a common internet grievance—people being sued or criticized for actions they themselves rely on—suggesting a future where everyone is entangled in litigation while practical enforcement remains unclear.
The transcript also pushes back on the idea that DeepSeek R1 proves AI is “over” or that future models will be cheap with no major investment. It argues that R1’s low cost (the transcript throws out a figure like “5.5 million,” while also implying the number may be exaggerated) still depends on expensive groundwork: R1 is described as distilling or “drafting off” a model that cost “billions” to build. The key logic is “no free lunch”—a cheap model can’t exist without costly training or access to a costly base model. In that framing, R1’s efficiency is real but not magical: it’s the result of performance work, including writing its own version of CUDA rather than using CUDA directly, and optimizing the inference path.
Finally, the transcript treats DeepSeek’s behavior as evidence of how LLMs work under the hood. When asked “who” a model is, it may answer incorrectly (e.g., returning “ChatGPT” variants), and the transcript attributes that to next-token prediction: the model generates the most likely continuation, and repeated jokes or expectations can create a self-fulfilling pattern in outputs. Overall, the message is less about whether AI can be regulated and more about how incentives, licensing, and model economics collide—producing both legal confusion and competitive posturing.
Cornell Notes
The transcript argues that OpenAI’s complaints about DeepSeek R1 are hypocritical, pointing to alleged past training practices (including potentially license-violating code and image/data scraping) while OpenAI claims distillation for commercial use should be illegal. It then cites a U.S. Copyright Office report saying generative AI outputs can be copyrighted only when a human author adds sufficient expressive elements, and questions how that fits with the reality of AI remixing copyrighted inputs. On the competition side, it rejects the idea that R1 proves AI will become cheap overnight, claiming R1’s performance likely depends on distilling from a much more expensive “multi-billion” model. It also explains odd identity answers as normal behavior for next-token prediction, where expected continuations (including jokes) can shape outputs.
Why does the transcript call OpenAI’s position on distillation “hypocritical”?
What does the U.S. Copyright Office report claim about copyright for generative AI outputs?
Why does the transcript question the logic of copyright protection for AI outputs?
What economic argument does the transcript make against the idea that R1 proves AI is “cheap now”?
What technical detail does the transcript highlight about R1’s efficiency?
How does the transcript explain why models may answer “who they are” incorrectly?
Review Questions
- How does the transcript connect OpenAI’s Terms of Service claims to alleged past training practices, and what does that imply about competitive fairness?
- What conditions does the transcript cite for copyright protection of generative AI outputs, and why does it consider the logic inconsistent?
- According to the transcript’s “no free lunch” argument, what must exist behind a low-cost model like R1?
Key Points
- 1
OpenAI’s distillation-related complaints are portrayed as hypocritical when set against allegations that OpenAI trained on data and code with potentially incompatible licenses.
- 2
A U.S. Copyright Office report is cited as saying generative AI outputs can be copyrighted only when a human author contributes sufficient expressive elements.
- 3
The transcript argues that copyright rules become hard to reconcile with how generative models ingest and remix copyrighted material.
- 4
DeepSeek R1’s low cost is framed as dependent on distilling from a much more expensive, multi-billion-dollar base model rather than being evidence that AI training is now trivial.
- 5
R1 is credited with engineering optimizations, including building its own alternative to CUDA for performance.
- 6
Odd “identity” answers are explained as expected behavior from next-token prediction, especially when internet expectations and jokes shape likely continuations.