PydanticAI - Building a Research Agent
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Choose a search backend that won’t throttle development; DuckDuckGo may rate-limit without a proxy, while Tavily is used to avoid that.
Briefing
A research agent built with PydanticAI can turn a single question into multiple targeted web searches, run them concurrently, and return a structured, markdown-ready report—while letting developers control output shape and inject time-sensitive context. The core payoff is reliability: instead of getting free-form text, the agent produces a predictable schema (title, main section, and bullet summaries) that can be rendered or further processed downstream.
The setup starts with choosing a search backend. DuckDuckGo is presented as a straightforward option but prone to rate limiting, especially without a proxy. Tavily is offered as a practical alternative, including a paid API tier that still provides “a thousand free calls a month,” making it easier to iterate without throttling. Both search paths support synchronous and asynchronous usage, but the agent is configured for asynchronous calls so multiple searches can run in parallel.
The agent’s structure hinges on three Pydantic-style building blocks. First, a small “search data” class carries parameters like max results per search. Second, a “result type” defines the final structured output: a research title, a research main section, and research bullets, all as strings intended for markdown formatting. Third, a system prompt instructs the model to generate strong keywords for 3–5 searches total, label each query with a query number, and then synthesize the retrieved information into the required schema. The prompt is intentionally adjustable; it can steer the agent toward academic or commercial framing depending on the use case.
A dedicated search tool is then created using a decorator-based interface. The tool accepts the model-generated query and query number, calls Tavily’s search context method, and returns results constrained by the max-results dependency. This tool wiring is what lets the model decide how many searches to run and what each query should be, while the developer keeps tight control over what the tool returns.
When asked for “a very detailed bio on Sam Altman,” the agent chooses four searches and runs them asynchronously—each search retrieving three results—before formatting everything into the structured markdown response. The transcript also highlights cost awareness: token usage can climb quickly because multiple searches generate a lot of text for the model to process. The agent can be tuned to be more economical by limiting searches or adjusting prompt guidance.
The most consequential refinement comes with time sensitivity. Asking for “the latest new AI news” initially yields outdated information because the system prompt lacks a “today’s date” reference; the model falls back to its cutoff (shown as October 2023). Injecting a “today” dependency into the system prompt fixes this: the agent now targets “the last few days,” decides it only needs three searches, and returns bullets aligned with early December 2024 items such as OpenAI’s “12 days of Shipmas” events and rumors about a ChatGPT Pro plan and the text2video SORA model.
Overall, the approach demonstrates the fundamentals needed for a research agent: structured output via a Pydantic schema, tool-based web retrieval, asynchronous multi-query execution, and dependency-driven prompt injection to keep results current. The next step foreshadows a similar agent pattern combined with RAG and a vector store for grounded answers from curated documents.
Cornell Notes
The agent design uses PydanticAI-style structured outputs to turn one question into multiple web searches, run concurrently, and return a predictable markdown report. A result schema defines fields like research title, research main, and research bullets, so downstream code can rely on consistent formatting. A search tool wraps Tavily (or DuckDuckGo) and takes model-generated keywords plus a query number, while a dependency controls max results per search. The model decides how many searches to run (typically 3–5) and what queries to use. Injecting a “today’s date” dependency into the system prompt is crucial for time-sensitive questions; without it, outputs reflect the model’s cutoff (e.g., October 2023).
How does the agent keep its output consistent enough to be useful programmatically?
What controls how many web searches the agent performs, and why does that matter for cost?
Why switch from DuckDuckGo to Tavily in the example?
How does asynchronous execution change the agent’s behavior?
What’s the practical fix for “latest news” questions returning outdated results?
How does the agent decide what to search for in a single question?
Review Questions
- What fields are required by the result schema, and how do they map to the final markdown output?
- How does adding a “today’s date” dependency change the agent’s search strategy and the freshness of its answers?
- Where do max_results and query count come from, and how do they jointly affect token usage?
Key Points
- 1
Choose a search backend that won’t throttle development; DuckDuckGo may rate-limit without a proxy, while Tavily is used to avoid that.
- 2
Define a Pydantic result schema (title, main section, bullets) so the agent returns predictable, markdown-friendly structured output.
- 3
Use a tool wrapper for search that accepts model-generated query keywords and a query number, and enforce max results via dependencies.
- 4
Configure asynchronous search so multiple planned queries run in parallel, then synthesize the combined retrieved context into the final schema.
- 5
Control time sensitivity by injecting a formatted “today” date into the system prompt; otherwise answers can reflect the model’s cutoff (e.g., October 2023).
- 6
Expect token costs to rise with multi-query retrieval; add prompt guidance or reduce search counts when cost matters.