PydanticAI - The NEW Agent Builder on the Block

TL;DR

PydanticAI aims to make LLM agent outputs reliably conform to Pydantic-defined schemas so downstream code can consume results safely.

Briefing Cornell Notes

Briefing

PydanticAI positions itself as a new, Pydantic-first agent and LLM application framework built to make model outputs reliably conform to structured schemas—so results can be used programmatically instead of treated as free-form text. The core idea is familiar from earlier “LLM + validation” patterns: define a data model, validate the model’s response against it, and—when needed—prompt the model to correct its output. What’s new is the shift from “plug Pydantic into an agent framework” to “build an LLM framework on top of Pydantic,” with the surrounding features—system prompts, tool use, chat history, and RAG-style workflows—designed to work with that validation layer.

The pitch matters because most agent frameworks already rely on schema validation somewhere in the pipeline, but teams still face friction: keeping outputs consistent, wiring structured results into downstream code, and managing production control flow. PydanticAI claims to address these issues with model-agnostic support (OpenAI, Google Vertex AI/Gemini, Grok, and planned Anthropic support), type-safe design, and “vanilla Python” control flow for agent composition. That Pythonic approach is presented as a production advantage: rather than relying on complex orchestration abstractions, developers can restructure state and workflow logic directly in Python.

From there, the walkthrough demonstrates three practical capabilities. First is straightforward chat prompting: create an agent with a system prompt and a user prompt, then swap models (e.g., using Gemini 1.5 Flash) and adjust the system prompt dynamically via injection. Second is structured output. By defining a Pydantic class for the expected response (example fields include city, country, and a reason), the model returns neatly formatted data. The example extends the schema to include a “famous person from the city,” showing how adding fields to the schema changes the output contract while keeping the rest of the workflow intact. The results are also inspectable—down to token usage—and accessible as Python objects or JSON.

Third is tool use / function calling. A weather agent is built with two tools: one tool resolves a location description into latitude and longitude, and a second tool uses those coordinates to fetch weather details (temperature and a natural-language description derived from API codes). When API keys are absent, the tools return dummy responses, but the tool-calling trace still shows the model deciding which tools to call and in what order. The example demonstrates multiple tool calls in one run (London and Singapore), followed by a final natural-language response that combines the tool outputs.

Overall, PydanticAI is framed as a compact alternative to heavier agent stacks: schema-driven reliability, dynamic prompt/tool injection, explicit message history management, and streaming-friendly structured output—implemented in a way that aims to stay readable and easy to adapt for real deployments. The accompanying mention of Logfire adds an observability option for tracking inputs and outputs, though it’s treated as optional rather than required.

Cornell Notes

PydanticAI builds an LLM agent framework around Pydantic so model outputs can be validated against explicit schemas. That approach targets a common pain point in agent development: free-form text is hard to use reliably in downstream code, so structured results need enforcement. The framework supports multiple model providers (OpenAI, Gemini/Vertex AI, Grok, with Anthropic support coming) and emphasizes type safety and “vanilla Python” control flow for easier production adaptation. In practice, it supports dynamic system prompts, structured output via Pydantic classes (including JSON/Python access), chat-style message history, and tool/function calling with dependency injection. The weather example shows the model making multiple tool calls and then producing a final consolidated response.

Why does schema validation matter for LLM agents, and how does PydanticAI use it?

Schema validation matters because agent outputs must be machine-readable for later steps like tool inputs, database writes, or UI rendering. PydanticAI uses Pydantic models as the contract: developers define a class for the expected response fields (e.g., city, country, reason, famous person). The framework then validates the LLM output against that class, producing structured data that can be consumed programmatically. The walkthrough highlights that the same pattern works for both simple prompts and more complex structured responses, and that results can be accessed as Python objects or JSON.

How does PydanticAI handle dynamic prompting and model switching during a conversation?

The walkthrough shows that system prompts can be injected dynamically rather than being fixed at agent creation. It also demonstrates changing the model by setting the agent’s model (e.g., starting with Gemini 1.5 Flash, then switching to another model later). Crucially, the message history can be reused: one model’s conversation context can be passed into another model for subsequent turns, preserving continuity even when the underlying LLM changes.

What does “structured output” look like in the examples, and what changes when the schema changes?

Structured output is implemented by defining a Pydantic class that represents the response shape. In the example, the initial schema includes city and country, then it’s extended with a “reason” field and further with “famous person from the city.” Adding fields to the schema changes what the model must return, and the framework returns neatly formatted structured results. The example also demonstrates indexing into returned fields (e.g., accessing the last run’s data) and inspecting token usage.

How does tool use / function calling work, and what is the role of dependencies?

Tool use is built around defining tools that the model can call, then injecting dependencies needed by those tools. In the weather example, one tool converts a location description into latitude/longitude, and another tool uses those coordinates to fetch weather details (temperature plus a natural-language description derived from API codes). The agent passes dependencies (like a weather API key and client) into the tool layer. When keys are missing, the tools return dummy responses, but the tool-call trace still shows the model selecting and sequencing tool calls.

What does the weather tool-calling trace reveal about multi-step reasoning in practice?

The trace shows the model making separate tool calls for each requested location (London and Singapore). First, it calls the latitude/longitude tool for each location, then it calls the weather tool for each set of coordinates. After tool outputs return, the model produces a final text response that combines the results (e.g., “21 and sunny” for both locations in the dummy scenario). The output also includes token accounting details, reinforcing that the framework keeps structured and operational metadata alongside the final answer.

Review Questions

How would you design a Pydantic response schema for an agent that must return both user-facing text and machine-readable fields?
What are the practical benefits of using “vanilla Python” control flow for agent composition compared with more abstract orchestration layers?
In the weather example, what information must flow from the first tool to the second tool, and how does dependency injection support that pipeline?

Key Points

1
PydanticAI aims to make LLM agent outputs reliably conform to Pydantic-defined schemas so downstream code can consume results safely.
2
The framework is built around Pydantic rather than treating validation as an add-on inside another agent framework.
3
Model support is presented as model-agnostic, with OpenAI, Google Vertex AI/Gemini, Grok supported and Anthropic support described as forthcoming.
4
Structured outputs are produced by defining Pydantic classes for the expected response fields, and the results can be retrieved as Python objects or JSON.
5
System prompts and tools can be injected dynamically, enabling changes in behavior without rebuilding the entire agent graph.
6
Chat-style message history can be passed into subsequent calls, including switching LLMs mid-conversation while preserving context.
7
Tool/function calling is implemented with dependency injection, and multi-step tool sequences are visible through tool-call traces and token usage metadata.

Highlights

PydanticAI’s central move is building an LLM agent framework on top of Pydantic so schema validation becomes a first-class feature, not a bolt-on.

Adding fields to a Pydantic response model (like “famous person from the city”) directly changes the required structure of the model’s output.

The weather example demonstrates multi-tool execution: location geocoding first, then weather lookup, then a final consolidated natural-language response.

Tool calls remain inspectable even when API keys are missing, making it easier to debug the agent’s decision flow.

Topics

PydanticAI
Structured Output
Tool Calling
Agent Frameworks
Schema Validation

Mentioned

Sam Witteveen