New FREE & Open Reasoning LLM Matches Open AI o1! + RTX 5090 Unboxing! AI News

TL;DR

DeepSeek R1 is presented as fully open source under the MIT license, enabling free download, modification, and developer improvements.

Briefing Cornell Notes

Briefing

DeepSeek R1 is landing as a fully open-source reasoning model that performs essentially on par with OpenAI’s o1—while also undercutting it on accessibility and cost. Benchmarks cited for DeepSeek R1 show it matching or slightly exceeding o1 on AIM 2024 (about 79–80% accuracy, with DeepSeek R1 at 79.8% and o1 at roughly the same level). The smaller DeepSeek R1 32B variant also holds up strongly: it posts 72.6% on AIM 2024, far ahead of OpenAI’s o1-mini at 63.6%, and dramatically above DeepSeek’s older V3 model at 39.2%. The pattern repeats across other reasoning-heavy tests—DeepSeek R1 is described as “closing the gap” with o1 on code-focused evaluations (96.3% for R1 vs 96.6% for o1), while the 32B version still reaches about 90%.

Not every benchmark is a tie. On GPQA, OpenAI’s o1 retains a modest edge (75.7% vs DeepSeek R1 at 71.5%), though the margin is framed as relatively small. Math results look especially strong for DeepSeek R1, with 97.3% accuracy—just 1% behind o1—while the 32B model sits close at 94%. On SWE-bench Verified, DeepSeek R1 is presented as nearly matching o1 again, with the 32B variant lagging behind both o1 and DeepSeek V3 and o1-mini on that particular measure. Even so, the overall takeaway is that DeepSeek’s reasoning stack has improved sharply compared with its own prior generation, and that the 32B model delivers near state-of-the-art performance for a fraction of the size.

Beyond the headline model, DeepSeek is also releasing distilled smaller models “stemming from DeepSeek R1,” including versions as small as 1.5B parameters. Those tiny models are claimed to outperform GPT-4o on some benchmarks, with larger distilled variants (7B, 14B, 32B, and even a 70B) pushing further. Pricing is another major point: the transcript cites extremely low token costs—14 cents per million input tokens and 55 cents per million input tokens, with output around $29 per million—positioning the system as practical for builders who want to run reasoning at scale.

The rest of the AI news shifts from model performance to infrastructure and generation workflows. A new open-source training framework, Yandex FSDP (fully shared data parallel), is pitched as a way to speed large language model training by reducing GPU communication bottlenecks, cutting memory usage, and saving resources across up to 150 GPUs. For local deployment, Nvidia’s RTX 5090 is mentioned via an upcoming unboxing, with 32GB VRAM highlighted as enough to run local models and potentially test DeepSeek R1 32B and even 70B-class models.

Finally, the transcript emphasizes a clear trend in AI video: consistency. Early-access “elements” features from Kling AI are shown combining multiple input images (including a person, a room, and a Red Bull can) into coherent scenes with consistent characters and backgrounds. Other tools and workflows—like LoRA fine-tuning for OneAI video generation—are framed as the path toward stable characters across clips, with examples such as a John Wick-style character that’s difficult to distinguish from real footage. The segment closes with additional research and product updates in relighting, lightweight speech generation, and interactive image editing, all aimed at making AI outputs more controllable and usable in real workflows.

Cornell Notes

DeepSeek R1 is presented as a fully open-source reasoning model that matches or slightly exceeds OpenAI’s o1 on several benchmarks, with especially strong results on AIM 2024, code-focused tests, and math. The smaller DeepSeek R1 32B model is highlighted as unusually close to o1-level performance for its size, outperforming OpenAI’s o1-mini on multiple tasks. The transcript also notes a modest OpenAI lead on GPQA and a weaker showing for the 32B model on SWE-bench Verified, but overall improvement over DeepSeek’s older V3 is described as dramatic. DeepSeek additionally releases distilled smaller models and cites very low API token pricing, making the system attractive for developers. The broader news theme ties into infrastructure (Yandex FSDP) and video generation’s move toward consistent characters and backgrounds via Kling “elements” and LoRA workflows.

What benchmark results are used to claim DeepSeek R1 is on par with OpenAI o1?

The transcript cites AIM 2024 accuracy where DeepSeek R1 is about 79.8% and OpenAI o1 is roughly comparable, with DeepSeek R1 slightly ahead. For code-focused evaluation, DeepSeek R1 is listed at 96.3% versus o1 at 96.6%. For math, DeepSeek R1 is given as 97.3%, about 1% behind o1, while the 32B variant is close at 94%. On SWE-bench Verified, DeepSeek R1 is described as nearly tied with o1, though the 32B variant falls behind.

How does the DeepSeek R1 32B model change the “cost vs capability” story?

DeepSeek R1 32B is repeatedly framed as unusually strong for a smaller model. On AIM 2024 it reaches 72.6%, far above OpenAI o1-mini at 63.6%, and well above DeepSeek V3 at 39.2%. On code-focused tests it is described as closing much of the gap to o1 (about 90% vs 96.6% for o1). The transcript also emphasizes that 32B is close to state-of-the-art reasoning performance while being smaller and more practical to run.

Where does OpenAI still lead, and by how much?

The transcript points to GPQA as the clearest area where OpenAI’s o1 leads: o1 at 75.7% versus DeepSeek R1 at 71.5%, a gap of roughly 4 percentage points. It also notes that on SWE-bench Verified the 32B variant trails more noticeably, and that benchmark-by-benchmark performance can vary rather than staying perfectly tied.

What does “open source” mean here, and why does the MIT license matter?

DeepSeek R1 is described as fully open source and freely downloadable, with developers able to modify and improve it. The transcript specifies the model is under the MIT license, which generally permits broad reuse and modification, making it easier for teams to adapt the model for their own products and research without relying on a closed system.

How does the transcript connect model progress to cheaper training and better video generation?

For training, it highlights Yandex FSDP, an open-source framework aimed at reducing GPU communication bottlenecks and optimizing GPU resources, with claims of up to 25% training-time reduction for Transformer models and savings across up to 150 GPUs. For video generation, it emphasizes consistency as the next breakthrough: Kling AI’s “elements” feature combines multiple input images into coherent scenes, while LoRA fine-tuning is used to keep characters consistent across clips (e.g., a John Wick-style character).

What pricing and hardware details are given to support local or scalable deployment?

The transcript cites token pricing of 14 cents per million input tokens and 55 cents per million input tokens, with output around $29 per million—positioning API usage as very inexpensive for general tasks. It also mentions Nvidia’s RTX 5090 unboxing with 32GB VRAM, suggesting it could run local models, including DeepSeek R1 32B and potentially 70B-class models, plus image and video generation experiments.

Review Questions

Which benchmarks in the transcript show DeepSeek R1 outperforming or matching OpenAI o1, and what are the approximate accuracy numbers?
Why is the DeepSeek R1 32B model portrayed as a major practical milestone compared with o1-mini?
What infrastructure bottleneck does Yandex FSDP target, and how is that expected to translate into training savings?

Key Points

1
DeepSeek R1 is presented as fully open source under the MIT license, enabling free download, modification, and developer improvements.
2
AIM 2024 results are used to claim DeepSeek R1 slightly exceeds OpenAI o1 (about 79.8% vs roughly comparable performance) while DeepSeek R1 32B still posts 72.6%.
3
On code-focused benchmarks, DeepSeek R1 is nearly tied with o1 (96.3% vs 96.6%), and the 32B variant is described as closing much of the gap (around 90%).
4
OpenAI’s o1 retains a notable but relatively small lead on GPQA (75.7% vs 71.5% for DeepSeek R1).
5
DeepSeek releases distilled smaller models down to 1.5B parameters and cites very low API token pricing (14¢/M input, 55¢/M input, ~$29/M output).
6
Yandex FSDP is highlighted as an open-source training framework aimed at reducing GPU communication bottlenecks to speed LLM training and cut resource use.
7
AI video generation progress is framed around consistency—Kling “elements” and LoRA workflows are used to keep characters, backgrounds, and objects stable across clips.

Highlights

DeepSeek R1 is repeatedly benchmarked as matching o1-level reasoning, with the 32B variant still outperforming o1-mini on AIM 2024 (72.6% vs 63.6%).

Code performance is nearly tied: DeepSeek R1 at 96.3% versus OpenAI o1 at 96.6%, while DeepSeek V3 lags far behind at 58.7%.

Kling AI’s “elements” feature is showcased as producing coherent, consistent character and background composition from multiple input images (including a Red Bull can).

Yandex FSDP is pitched as cutting LLM training time by up to 25% by optimizing GPU communication and reducing memory usage.

Topics

DeepSeek R1
Reasoning Benchmarks
Open Source Models
AI Video Consistency
LLM Training Infrastructure

Mentioned

DeepSeek
OpenAI
Nvidia
Kling
Yandex
GPT-4 Omni
Llama
ComfyUI
Hugging Face
Sam Altman
Elia
MIT
VRAM
FSDP
GPU
API
LoRA
SWE-bench Verified
GPQA
AIM 2024
MMLU