New FREE & Open Reasoning LLM Matches Open AI o1! + RTX 5090 Unboxing! AI News
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
DeepSeek R1 is presented as fully open source under the MIT license, enabling free download, modification, and developer improvements.
Briefing
DeepSeek R1 is landing as a fully open-source reasoning model that performs essentially on par with OpenAI’s o1—while also undercutting it on accessibility and cost. Benchmarks cited for DeepSeek R1 show it matching or slightly exceeding o1 on AIM 2024 (about 79–80% accuracy, with DeepSeek R1 at 79.8% and o1 at roughly the same level). The smaller DeepSeek R1 32B variant also holds up strongly: it posts 72.6% on AIM 2024, far ahead of OpenAI’s o1-mini at 63.6%, and dramatically above DeepSeek’s older V3 model at 39.2%. The pattern repeats across other reasoning-heavy tests—DeepSeek R1 is described as “closing the gap” with o1 on code-focused evaluations (96.3% for R1 vs 96.6% for o1), while the 32B version still reaches about 90%.
Not every benchmark is a tie. On GPQA, OpenAI’s o1 retains a modest edge (75.7% vs DeepSeek R1 at 71.5%), though the margin is framed as relatively small. Math results look especially strong for DeepSeek R1, with 97.3% accuracy—just 1% behind o1—while the 32B model sits close at 94%. On SWE-bench Verified, DeepSeek R1 is presented as nearly matching o1 again, with the 32B variant lagging behind both o1 and DeepSeek V3 and o1-mini on that particular measure. Even so, the overall takeaway is that DeepSeek’s reasoning stack has improved sharply compared with its own prior generation, and that the 32B model delivers near state-of-the-art performance for a fraction of the size.
Beyond the headline model, DeepSeek is also releasing distilled smaller models “stemming from DeepSeek R1,” including versions as small as 1.5B parameters. Those tiny models are claimed to outperform GPT-4o on some benchmarks, with larger distilled variants (7B, 14B, 32B, and even a 70B) pushing further. Pricing is another major point: the transcript cites extremely low token costs—14 cents per million input tokens and 55 cents per million input tokens, with output around $29 per million—positioning the system as practical for builders who want to run reasoning at scale.
The rest of the AI news shifts from model performance to infrastructure and generation workflows. A new open-source training framework, Yandex FSDP (fully shared data parallel), is pitched as a way to speed large language model training by reducing GPU communication bottlenecks, cutting memory usage, and saving resources across up to 150 GPUs. For local deployment, Nvidia’s RTX 5090 is mentioned via an upcoming unboxing, with 32GB VRAM highlighted as enough to run local models and potentially test DeepSeek R1 32B and even 70B-class models.
Finally, the transcript emphasizes a clear trend in AI video: consistency. Early-access “elements” features from Kling AI are shown combining multiple input images (including a person, a room, and a Red Bull can) into coherent scenes with consistent characters and backgrounds. Other tools and workflows—like LoRA fine-tuning for OneAI video generation—are framed as the path toward stable characters across clips, with examples such as a John Wick-style character that’s difficult to distinguish from real footage. The segment closes with additional research and product updates in relighting, lightweight speech generation, and interactive image editing, all aimed at making AI outputs more controllable and usable in real workflows.
Cornell Notes
DeepSeek R1 is presented as a fully open-source reasoning model that matches or slightly exceeds OpenAI’s o1 on several benchmarks, with especially strong results on AIM 2024, code-focused tests, and math. The smaller DeepSeek R1 32B model is highlighted as unusually close to o1-level performance for its size, outperforming OpenAI’s o1-mini on multiple tasks. The transcript also notes a modest OpenAI lead on GPQA and a weaker showing for the 32B model on SWE-bench Verified, but overall improvement over DeepSeek’s older V3 is described as dramatic. DeepSeek additionally releases distilled smaller models and cites very low API token pricing, making the system attractive for developers. The broader news theme ties into infrastructure (Yandex FSDP) and video generation’s move toward consistent characters and backgrounds via Kling “elements” and LoRA workflows.
What benchmark results are used to claim DeepSeek R1 is on par with OpenAI o1?
How does the DeepSeek R1 32B model change the “cost vs capability” story?
Where does OpenAI still lead, and by how much?
What does “open source” mean here, and why does the MIT license matter?
How does the transcript connect model progress to cheaper training and better video generation?
What pricing and hardware details are given to support local or scalable deployment?
Review Questions
- Which benchmarks in the transcript show DeepSeek R1 outperforming or matching OpenAI o1, and what are the approximate accuracy numbers?
- Why is the DeepSeek R1 32B model portrayed as a major practical milestone compared with o1-mini?
- What infrastructure bottleneck does Yandex FSDP target, and how is that expected to translate into training savings?
Key Points
- 1
DeepSeek R1 is presented as fully open source under the MIT license, enabling free download, modification, and developer improvements.
- 2
AIM 2024 results are used to claim DeepSeek R1 slightly exceeds OpenAI o1 (about 79.8% vs roughly comparable performance) while DeepSeek R1 32B still posts 72.6%.
- 3
On code-focused benchmarks, DeepSeek R1 is nearly tied with o1 (96.3% vs 96.6%), and the 32B variant is described as closing much of the gap (around 90%).
- 4
OpenAI’s o1 retains a notable but relatively small lead on GPQA (75.7% vs 71.5% for DeepSeek R1).
- 5
DeepSeek releases distilled smaller models down to 1.5B parameters and cites very low API token pricing (14¢/M input, 55¢/M input, ~$29/M output).
- 6
Yandex FSDP is highlighted as an open-source training framework aimed at reducing GPU communication bottlenecks to speed LLM training and cut resource use.
- 7
AI video generation progress is framed around consistency—Kling “elements” and LoRA workflows are used to keep characters, backgrounds, and objects stable across clips.