DeepSeek R1 0528 - Better Coding & Tool Calling | Is It Faster Now?
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
DeepSeek R1 0528 adds JSON output and function calling, making it more practical for tool-using coding agents and structured workflows.
Briefing
DeepSeek R1 0528’s update centers on making the model more usable for real-world coding agents by adding support for JSON output and function calling—capabilities that typically power tool use, structured responses, and agent workflows. The release also claims enhanced front-end behavior, fewer hallucinations, and improved benchmark performance, with the model available through DeepSeek’s UI and API after weights were posted to Hugging Face.
For coding specifically, the update is framed as a meaningful step up. Early community chatter points to substantial gains on coding tasks, and the creator highlights that DeepSeek R1-style “thinking” models can be awkward in practice: they spend time generating reasoning tokens, which slows down inference compared with faster models where “thinking” can be disabled. That tradeoff matters for developers building interactive coding tools, where latency and responsiveness often decide whether an agent feels usable.
One benchmark singled out is LiveCodeBench, described as holistic and contamination-free, with a focus on large language model coding performance. In that leaderboard snapshot, OpenAI models like o4-mini and o3 (and o4-mini-medium) appear near the top, while other commercial models such as Sonnet 4 and Opus 4 land lower. Against that backdrop, the updated DeepSeek R1 is reported to score extremely well—potentially even ahead of Gemini 2.5 Pro—though the transcript also flags skepticism about trusting any single benchmark with 100% confidence.
Under the hood, DeepSeek R1 0528 is presented as a “minor version upgrade” that nonetheless delivers a large practical change. The update reportedly improves the depth of reasoning and inference by using more computational resources (more GPUs) and algorithmic optimizations during post-training. The exact optimizations aren’t detailed, but the release notes mention system prompting support and remove the need to force a thinking pattern with a required “think” tag at the start of output.
A key detail for developers is the available distilled variant: the only dist version mentioned uses an 8B-parameter base model. It is created by distilling chain-of-thought from a larger 38B model into that 8B base, with the claim that this pushes performance into the state-of-the-art range for the 8B tier.
In hands-on testing, the model is fed a large, detailed specification (over 4,000–5,000 tokens) for a to-do application and then asked to produce a single-file HTML/CSS/JavaScript landing page. The model spends roughly 23 seconds “thinking” to confirm and summarize the specification, then takes about 19 seconds for the landing-page generation plan and another couple of minutes to output the full code in the free inference setup. The resulting page includes animations, hover effects, pricing sections, social links, and even an image—overall described as professional and “picture perfect” for the task, though the tester notes that the model’s verbosity (summarizing what was provided) can be a mixed blessing for interactive workflows.
Overall, DeepSeek R1 0528 is positioned as a stronger open coding model with agent-ready structured output, but with practical latency considerations tied to its reasoning-heavy behavior—especially compared with faster models that can reduce or skip “thinking” tokens.
Cornell Notes
DeepSeek R1 0528 adds two agent-critical features: JSON output support and function calling. Those changes are meant to make the model easier to plug into tool-using coding agents and structured workflow systems. The update also claims better reasoning depth and inference performance through more compute and post-training optimizations, including system prompting support that removes the need for a forced “think” tag. Community and benchmark signals (notably LiveCodeBench) suggest strong coding gains, though benchmark trust is treated cautiously. In practical tests, the model handles large prompts and can generate a polished single-file HTML/CSS/JavaScript landing page, but it is slower due to visible “thinking” and reasoning token generation.
What new capabilities in DeepSeek R1 0528 matter most for building coding agents?
Why might a “thinking” model feel slower for coding tasks in real products?
Which benchmark is used to gauge coding performance, and what’s the caveat?
What changes are described for reasoning behavior and prompting?
How is the distilled 8B model constructed, and why does it matter?
What did the hands-on test reveal about context handling and output quality?
Review Questions
- How do JSON output and function calling change what an LLM can do inside an agentic coding workflow?
- What tradeoff does the transcript describe between reasoning-heavy models and faster coding models in terms of latency?
- Why might system prompting support and removal of a required “think” tag affect how developers integrate DeepSeek R1 into their pipelines?
Key Points
- 1
DeepSeek R1 0528 adds JSON output and function calling, making it more practical for tool-using coding agents and structured workflows.
- 2
The update claims improved reasoning depth and inference performance through more GPU compute and post-training algorithmic optimizations.
- 3
System prompting support is included, and forcing a thinking pattern with a required “think” tag at the start of output is no longer necessary.
- 4
Coding performance signals are strong on LiveCodeBench, but the transcript treats any single leaderboard as not fully trustworthy on its own.
- 5
“Thinking” models can feel slower for coding because they generate extra reasoning tokens; faster models may deliver better interactive latency.
- 6
A distilled 8B variant is available, built by distilling chain-of-thought from a 38B model into an 8B base to target strong performance at lower size.
- 7
In a large-prompt test, the model generated a polished single-file landing page, but the free inference run took minutes, reflecting reasoning overhead.