DeepSeek R1 for Structured Agents

TL;DR

DeepSeek R1 can’t be relied on for function calling or schema-enforced JSON, so structured-agent integrations need a workaround.

Briefing Cornell Notes

Briefing

DeepSeek’s R1 reasoning model can’t natively produce the structured, tool-friendly outputs that most agent frameworks rely on—no function calling, no guaranteed JSON, and no structured schema support. The workaround is to route R1 through an orchestration pattern where a separate “formatter” or “orchestrator” model converts R1’s free-form response into the exact structured fields an agent needs (title, bullets, final answer, and optionally reasoning). This matters because it lets developers keep using structured-agent workflows (like Pydantic AI) even while reasoning models lag behind in agent integration features.

A key setup change comes from using DeepSeek’s API in a way that mirrors OpenAI’s interface, then wiring it into Pydantic AI. The transcript shows two DeepSeek models used side-by-side: DeepSeek V3 for normal structured agent behavior with Pydantic, and DeepSeek R1 for reasoning. When DeepSeek V3 is used with a Pydantic “response type” and a search tool, it cleanly generates multiple search queries and returns a structured report with fields like title, main content, and bullet points. That path works because the model integration supports the structured-output expectations.

Switching only the model to DeepSeek R1 breaks the structured-output contract: R1 doesn’t support function calling or schema-constrained JSON output. If the agent asks for a structured result type, R1 returns a response that can be read, but it can’t be reliably indexed into the expected fields. The transcript demonstrates that removing the structured-output constraints allows R1 to still follow the prompt and produce content that includes the requested sections—title, bullets, and a final answer—just not in a machine-enforceable structure.

To restore structure, the “hack” is a two-call pipeline. First, DeepSeek R1 generates the reasoning and the formatted-looking content. Then a cheap model—Gemini 1.5 Flash—is used purely as a formatting assistant: it receives R1’s output plus a target schema description, and it returns consistent structured data that Pydantic AI can consume. This adds latency and cost, but the formatter call is positioned as fast and inexpensive, while R1 remains the reasoning engine.

The transcript also highlights a subtle but important detail about R1 outputs: the API returns separate components for “content” (the answer) and “reasoning content” (the chain-of-thought-style reasoning). For multi-turn conversations, reasoning content is not meant to be resent every time; it should be captured when returned and stored or reused appropriately. To include reasoning in the final structured output, the workflow extracts both parts, wraps the reasoning in tags, concatenates them with the answer content, and sends the combined text to the formatting model.

Finally, an alternative orchestration approach treats DeepSeek R1 as a tool inside a larger agent loop. Gemini 1.5 Flash orchestrates: it calls a “get reasoning answers” tool (which fetches R1’s reasoning + answer), uses a search tool to gather sources, then synthesizes and formats the final report. A practical example—requesting a report on “GRPO RL” in DeepSeek R1—shows that R1 can struggle with acronym disambiguation when it’s asked to generate search keywords without context, but it performs better once search results provide grounding. The takeaway: use R1 where reasoning synthesis matters, and rely on simpler models or search-grounded steps for tasks like keyword generation and acronym interpretation.

Cornell Notes

DeepSeek R1 delivers strong reasoning, but it doesn’t natively support function calling or schema-enforced structured outputs (like JSON) that agent frameworks expect. A workable pattern is to let R1 produce free-form content, then use a second model (e.g., Gemini 1.5 Flash) as a formatting assistant that converts R1’s output into a strict structure (title, bullets, final answer, and optionally reasoning). The DeepSeek API can return reasoning content separately from answer content, so capturing both parts matters if you want them in the final structured result. Another option is orchestration: Gemini 1.5 Flash can call R1 as a tool, run search for grounded sources, and then synthesize and format the report. These approaches keep structured-agent workflows alive despite missing native integration features.

Why does DeepSeek R1 break typical structured-agent workflows that work with DeepSeek V3?

DeepSeek R1 lacks function calling and doesn’t reliably produce schema-constrained JSON or other machine-enforceable structured outputs. In the transcript, Pydantic AI can successfully enforce a response type with DeepSeek V3, but when the model is swapped to DeepSeek R1, the agent can’t index fields like title or bullets from the response because R1 doesn’t honor the structured-output contract.

What is the two-call “formatter” hack, and how does it restore structured outputs?

First, DeepSeek R1 is called without Pydantic’s structured response constraints so it can generate the requested sections in free-form text (title, bullet points, final answer, and optionally reasoning). Second, a cheap model—Gemini 1.5 Flash—is prompted as a formatting assistant: it receives R1’s output plus the desired schema and returns consistent structured fields that Pydantic AI can consume.

How does the DeepSeek API’s separation of “content” and “reasoning content” affect implementation?

The transcript notes that the API returns two parts: content (the answer) and reasoning content (chain-of-thought-style reasoning). If the goal is to include reasoning in the final structured output, the workflow must extract both parts and combine them (e.g., wrapping reasoning in tags) before sending to the formatting model. It also warns that in multi-round conversations, reasoning content should be captured when returned and not blindly resent each turn.

What orchestration pattern treats DeepSeek R1 as a tool instead of a direct structured-output model?

Gemini 1.5 Flash acts as an orchestrator with tools. One tool calls DeepSeek R1 to fetch reasoning + answer; another tool performs search. The orchestrator uses the reasoning engine for synthesis after search results arrive, then produces the final report. This avoids relying on R1 for structured outputs or function calling.

Why did the “GRPO RL” example show different behavior with and without search?

When R1 is asked to generate search keywords without grounding, it can misinterpret acronyms (e.g., treating GRPO/RL as ambiguous terms like reinforcement learning or other expansions). In the orchestration flow, search provides context, and the system can retrieve relevant pages, making the synthesis more accurate. The transcript concludes that keyword generation may not benefit from heavy reasoning; it often needs grounding via search or a simpler model.

Review Questions

In a Pydantic AI agent, what specific capability is missing from DeepSeek R1 that forces a workaround (and what breaks when you try to use a structured response type directly)?
Describe how you would include DeepSeek R1’s reasoning content in the final structured output without violating multi-turn reasoning handling.
When would you prefer the two-call formatting approach versus the tool-orchestration approach for agent design?

Key Points

1
DeepSeek R1 can’t be relied on for function calling or schema-enforced JSON, so structured-agent integrations need a workaround.
2
Using DeepSeek V3 with Pydantic AI demonstrates the “happy path” where structured response types and tool calls work cleanly.
3
A practical fix is a two-step pipeline: DeepSeek R1 generates free-form sections, then Gemini 1.5 Flash formats them into a strict schema.
4
DeepSeek’s API returns answer content and reasoning content separately; capturing and combining both is necessary if reasoning must appear in the final structured result.
5
In multi-round conversations, reasoning content should be stored when returned rather than resent each turn.
6
Orchestration can treat DeepSeek R1 as a tool inside a larger agent loop that performs search first, then uses R1 for synthesis and final formatting.
7
For tasks like acronym/keyword generation, grounding via search (or a simpler model) can outperform relying on R1’s guessing.

Highlights

DeepSeek R1 follows prompts and produces the right sections, but it doesn’t reliably output machine-parseable structured data for agents.

Gemini 1.5 Flash can be used as a dedicated formatting assistant to convert R1’s free-form output into consistent structured fields.

The DeepSeek API’s split between “content” and “reasoning content” changes how you build structured outputs and how you handle multi-turn reasoning.

Treating R1 as a tool inside an orchestrator enables search-grounded synthesis even without native function calling.

The “GRPO RL” example suggests reasoning models may mis-handle acronym disambiguation unless search provides context.

Topics

Structured Agents
DeepSeek R1
Pydantic AI
Function Calling
Reasoning Content

Mentioned

Pydantic AI
Gemini 1.5 Flash
DeepSeek
DeepSeek V3
DeepSeek R1
Sam Witteveen
R1
JSON
API
Pydantic
GRPO
RL
LLM
SDK