Build anything with DeepSeek R1, here’s how

TL;DR

DeepSeek R1 is framed as open-source reasoning capability comparable to OpenAI o1, arriving shortly after o1’s release.

Briefing Cornell Notes

Briefing

DeepSeek R1 is positioned as an open-source reasoning model that matches OpenAI’s o1-level performance while being dramatically cheaper—about 27x lower token costs—arriving only about 46 days after o1 launched. The pitch hinges on two practical advantages: cost and transparency. DeepSeek’s pricing puts input at $0.55 per million tokens and output at $2.2 per million tokens, compared with o1’s $15 (input) and $60 (output) per million tokens. That price gap matters because it makes “reasoning-heavy” applications feasible for individuals and small teams, not just large labs.

Beyond benchmarks, the transcript emphasizes a custom evaluation and a qualitative difference in how reasoning can be inspected. A Twitter user reportedly built their own eval where DeepSeek R1 “destroyed” other models, and the model’s reasoning is shown via a visible chain-of-thought-style output. The creator contrasts this with OpenAI’s o1/o1-preview experience, where reasoning tokens are paid for but not shown to users. The ability to see the model “think through a problem,” including mistakes, is presented as a major usability and debugging upgrade.

The transcript then traces how DeepSeek achieved this capability: less time spent on safety compared with OpenAI’s longer safety cycle, plus a training approach labeled “R1-Z,” meaning it starts from “zero” supervised training data and relies on reinforcement learning. The analogy is AlphaZero’s reinforcement learning success in mastering Go, and the key claim is that reinforcement learning produced an emergent behavior—longer thinking time leading to better outcomes without explicit instruction. The model is also described as part of a broader release strategy: DeepSeek reportedly dropped six smaller distilled models ranging from 70B down to 1.5B, with the smallest potentially runnable on a phone.

Finally, the transcript shifts from claims to implementation, walking through how to build an app using DeepSeek’s platform and API. Steps include creating a DeepSeek account, topping up a small amount (the transcript suggests $2 is enough given low token costs), generating an API key, and using Python with the OpenAI-compatible client library. The example prompt asks for high-leverage actions to prepare for a post-AGI world, then sets the model to “DeepSeek Reasoner” (to avoid defaulting to a less capable DeepSeek V3). The walkthrough also adds token streaming so reasoning output appears live in the console.

To turn a single response into a “team of agents,” the transcript uses multi-round conversation support from DeepSeek’s documentation and then implements a second agent in Cursor that takes the first agent’s “content” and asks follow-up questions—producing a more structured day-to-day plan. The result is a practical schedule (daily and weekly activities) for building AI literacy and resilience.

Overall, the core message is that DeepSeek R1’s combination of open availability, visible reasoning, and steeply lower inference costs makes it realistic for solo developers to compete with larger companies—especially by embedding it into agentic workflows and productivity tools.

Cornell Notes

DeepSeek R1 is presented as an open-source reasoning model that matches OpenAI’s o1-level performance while costing far less: $0.55 per million input tokens and $2.2 per million output tokens versus o1’s $15 and $60. A key differentiator is transparency—reasoning output is shown (including mistakes), unlike o1 where reasoning tokens are not visible to users. The transcript attributes the capability to reinforcement learning, including a “zero supervised data” approach (R1-Z) that starts from scratch and learns to spend more time thinking when it improves outcomes. It then demonstrates how to call DeepSeek Reasoner via an OpenAI-compatible Python client, enable token streaming, and build a two-agent workflow where a second agent uses the first agent’s answer to generate a day-to-day preparation plan for a post-AGI world.

Why does the transcript treat token cost as a decisive advantage for DeepSeek R1?

It compares per-token pricing directly: OpenAI o1 costs $15 per 1M input tokens and $60 per 1M output tokens, while DeepSeek R1 costs $0.55 per 1M input tokens and $2.2 per 1M output tokens. That gap is framed as roughly 27x cheaper at the same performance level, which makes “reasoning-heavy” prompts and multi-step agent workflows affordable for individuals rather than only well-funded teams.

What practical difference does “visible reasoning” create compared with o1?

The transcript claims DeepSeek R1 exposes the model’s reasoning tokens/chain-of-thought-style process, including mistakes, so users can see how the model arrives at an answer. It contrasts this with o1/o1-preview, where users pay for reasoning tokens but cannot view the reasoning process, limiting debugging and interpretability.

How does reinforcement learning (and R1-Z) factor into the model’s performance?

The transcript says DeepSeek used reinforcement learning and introduced an approach labeled R1-Z, meaning training starts from “zero” supervised data rather than learning from human-labeled examples. It claims the model autonomously learned an emergent strategy: spending more time thinking improves outcomes, without being explicitly told to do so.

What does the implementation walkthrough require to call DeepSeek R1 from Python?

It uses the DeepSeek platform to create an account, top up a small balance, then generate an API key. In code, it installs an OpenAI-compatible client library, initializes the client with the API key and the DeepSeek Reasoner endpoint/URL, and constructs a messages array. A crucial step is setting the model to “DeepSeek Reasoner” so it doesn’t fall back to DeepSeek V3.

How is token streaming used, and why does it matter in the example?

The walkthrough adds a parameter like stream=True and prints chunks as they arrive, so reasoning output appears live in the terminal. It contrasts this with a non-streaming approach that leaves users “in the blind” until the full response completes, which is especially noticeable when reasoning takes longer.

How does the transcript turn one model response into a multi-agent workflow?

It first generates an initial answer (with reasoning context and final content). Then it adds a second agent that takes the first agent’s content and asks follow-up questions—e.g., converting high-level strategies into a day-to-day plan for the years leading up to AGI. The second agent is prompted to use the first agent’s output explicitly, creating a chained, team-like interaction.

Review Questions

What pricing numbers in the transcript support the claim that DeepSeek R1 is about 27x cheaper than o1?
Why does the transcript insist on selecting “DeepSeek Reasoner” rather than letting the call default to DeepSeek V3?
In the two-agent setup, what information from the first agent is reused by the second agent, and how does that change the final output?

Key Points

1
DeepSeek R1 is framed as open-source reasoning capability comparable to OpenAI o1, arriving shortly after o1’s release.
2
Token pricing is a major differentiator: DeepSeek R1 is presented as roughly 27x cheaper than o1 using the transcript’s per-million-token numbers.
3
DeepSeek R1’s reasoning output is described as visible to users, unlike o1 where reasoning tokens are not shown.
4
The training approach is described as reinforcement learning, including an R1-Z “zero supervised data” setup that starts from scratch.
5
The implementation uses an OpenAI-compatible Python client, with the model explicitly set to “DeepSeek Reasoner” and token streaming enabled for live output.
6
A multi-agent workflow is built by chaining two agents: the second agent consumes the first agent’s content to generate a more actionable plan.

Highlights

DeepSeek R1’s pricing is cited as $0.55 per million input tokens and $2.2 per million output tokens—about 27x cheaper than o1 at $15/$60.

Visible reasoning is presented as a usability breakthrough: users can see the model’s reasoning process and mistakes rather than only the final answer.

R1-Z is described as starting from zero supervised training data and relying on reinforcement learning to discover better thinking strategies.

The walkthrough demonstrates streaming token-by-token output (stream=True) so reasoning appears live instead of waiting for completion.

A two-agent chain converts abstract “post-AGI preparation” advice into a concrete daily schedule.

Topics

Mentioned

David Ondrej
AGI
R1
R1-Z
o1
AI