DeepSeek R1 for Structured Agents
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
DeepSeek R1 can’t be relied on for function calling or schema-enforced JSON, so structured-agent integrations need a workaround.
Briefing
DeepSeek’s R1 reasoning model can’t natively produce the structured, tool-friendly outputs that most agent frameworks rely on—no function calling, no guaranteed JSON, and no structured schema support. The workaround is to route R1 through an orchestration pattern where a separate “formatter” or “orchestrator” model converts R1’s free-form response into the exact structured fields an agent needs (title, bullets, final answer, and optionally reasoning). This matters because it lets developers keep using structured-agent workflows (like Pydantic AI) even while reasoning models lag behind in agent integration features.
A key setup change comes from using DeepSeek’s API in a way that mirrors OpenAI’s interface, then wiring it into Pydantic AI. The transcript shows two DeepSeek models used side-by-side: DeepSeek V3 for normal structured agent behavior with Pydantic, and DeepSeek R1 for reasoning. When DeepSeek V3 is used with a Pydantic “response type” and a search tool, it cleanly generates multiple search queries and returns a structured report with fields like title, main content, and bullet points. That path works because the model integration supports the structured-output expectations.
Switching only the model to DeepSeek R1 breaks the structured-output contract: R1 doesn’t support function calling or schema-constrained JSON output. If the agent asks for a structured result type, R1 returns a response that can be read, but it can’t be reliably indexed into the expected fields. The transcript demonstrates that removing the structured-output constraints allows R1 to still follow the prompt and produce content that includes the requested sections—title, bullets, and a final answer—just not in a machine-enforceable structure.
To restore structure, the “hack” is a two-call pipeline. First, DeepSeek R1 generates the reasoning and the formatted-looking content. Then a cheap model—Gemini 1.5 Flash—is used purely as a formatting assistant: it receives R1’s output plus a target schema description, and it returns consistent structured data that Pydantic AI can consume. This adds latency and cost, but the formatter call is positioned as fast and inexpensive, while R1 remains the reasoning engine.
The transcript also highlights a subtle but important detail about R1 outputs: the API returns separate components for “content” (the answer) and “reasoning content” (the chain-of-thought-style reasoning). For multi-turn conversations, reasoning content is not meant to be resent every time; it should be captured when returned and stored or reused appropriately. To include reasoning in the final structured output, the workflow extracts both parts, wraps the reasoning in tags, concatenates them with the answer content, and sends the combined text to the formatting model.
Finally, an alternative orchestration approach treats DeepSeek R1 as a tool inside a larger agent loop. Gemini 1.5 Flash orchestrates: it calls a “get reasoning answers” tool (which fetches R1’s reasoning + answer), uses a search tool to gather sources, then synthesizes and formats the final report. A practical example—requesting a report on “GRPO RL” in DeepSeek R1—shows that R1 can struggle with acronym disambiguation when it’s asked to generate search keywords without context, but it performs better once search results provide grounding. The takeaway: use R1 where reasoning synthesis matters, and rely on simpler models or search-grounded steps for tasks like keyword generation and acronym interpretation.
Cornell Notes
DeepSeek R1 delivers strong reasoning, but it doesn’t natively support function calling or schema-enforced structured outputs (like JSON) that agent frameworks expect. A workable pattern is to let R1 produce free-form content, then use a second model (e.g., Gemini 1.5 Flash) as a formatting assistant that converts R1’s output into a strict structure (title, bullets, final answer, and optionally reasoning). The DeepSeek API can return reasoning content separately from answer content, so capturing both parts matters if you want them in the final structured result. Another option is orchestration: Gemini 1.5 Flash can call R1 as a tool, run search for grounded sources, and then synthesize and format the report. These approaches keep structured-agent workflows alive despite missing native integration features.
Why does DeepSeek R1 break typical structured-agent workflows that work with DeepSeek V3?
What is the two-call “formatter” hack, and how does it restore structured outputs?
How does the DeepSeek API’s separation of “content” and “reasoning content” affect implementation?
What orchestration pattern treats DeepSeek R1 as a tool instead of a direct structured-output model?
Why did the “GRPO RL” example show different behavior with and without search?
Review Questions
- In a Pydantic AI agent, what specific capability is missing from DeepSeek R1 that forces a workaround (and what breaks when you try to use a structured response type directly)?
- Describe how you would include DeepSeek R1’s reasoning content in the final structured output without violating multi-turn reasoning handling.
- When would you prefer the two-call formatting approach versus the tool-orchestration approach for agent design?
Key Points
- 1
DeepSeek R1 can’t be relied on for function calling or schema-enforced JSON, so structured-agent integrations need a workaround.
- 2
Using DeepSeek V3 with Pydantic AI demonstrates the “happy path” where structured response types and tool calls work cleanly.
- 3
A practical fix is a two-step pipeline: DeepSeek R1 generates free-form sections, then Gemini 1.5 Flash formats them into a strict schema.
- 4
DeepSeek’s API returns answer content and reasoning content separately; capturing and combining both is necessary if reasoning must appear in the final structured result.
- 5
In multi-round conversations, reasoning content should be stored when returned rather than resent each turn.
- 6
Orchestration can treat DeepSeek R1 as a tool inside a larger agent loop that performs search first, then uses R1 for synthesis and final formatting.
- 7
For tasks like acronym/keyword generation, grounding via search (or a simpler model) can outperform relying on R1’s guessing.