Get AI summaries of any video or article — Sign up free
PydanticAI - Building a Research Agent thumbnail

PydanticAI - Building a Research Agent

Sam Witteveen·
5 min read

Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Choose a search backend that won’t throttle development; DuckDuckGo may rate-limit without a proxy, while Tavily is used to avoid that.

Briefing

A research agent built with PydanticAI can turn a single question into multiple targeted web searches, run them concurrently, and return a structured, markdown-ready report—while letting developers control output shape and inject time-sensitive context. The core payoff is reliability: instead of getting free-form text, the agent produces a predictable schema (title, main section, and bullet summaries) that can be rendered or further processed downstream.

The setup starts with choosing a search backend. DuckDuckGo is presented as a straightforward option but prone to rate limiting, especially without a proxy. Tavily is offered as a practical alternative, including a paid API tier that still provides “a thousand free calls a month,” making it easier to iterate without throttling. Both search paths support synchronous and asynchronous usage, but the agent is configured for asynchronous calls so multiple searches can run in parallel.

The agent’s structure hinges on three Pydantic-style building blocks. First, a small “search data” class carries parameters like max results per search. Second, a “result type” defines the final structured output: a research title, a research main section, and research bullets, all as strings intended for markdown formatting. Third, a system prompt instructs the model to generate strong keywords for 3–5 searches total, label each query with a query number, and then synthesize the retrieved information into the required schema. The prompt is intentionally adjustable; it can steer the agent toward academic or commercial framing depending on the use case.

A dedicated search tool is then created using a decorator-based interface. The tool accepts the model-generated query and query number, calls Tavily’s search context method, and returns results constrained by the max-results dependency. This tool wiring is what lets the model decide how many searches to run and what each query should be, while the developer keeps tight control over what the tool returns.

When asked for “a very detailed bio on Sam Altman,” the agent chooses four searches and runs them asynchronously—each search retrieving three results—before formatting everything into the structured markdown response. The transcript also highlights cost awareness: token usage can climb quickly because multiple searches generate a lot of text for the model to process. The agent can be tuned to be more economical by limiting searches or adjusting prompt guidance.

The most consequential refinement comes with time sensitivity. Asking for “the latest new AI news” initially yields outdated information because the system prompt lacks a “today’s date” reference; the model falls back to its cutoff (shown as October 2023). Injecting a “today” dependency into the system prompt fixes this: the agent now targets “the last few days,” decides it only needs three searches, and returns bullets aligned with early December 2024 items such as OpenAI’s “12 days of Shipmas” events and rumors about a ChatGPT Pro plan and the text2video SORA model.

Overall, the approach demonstrates the fundamentals needed for a research agent: structured output via a Pydantic schema, tool-based web retrieval, asynchronous multi-query execution, and dependency-driven prompt injection to keep results current. The next step foreshadows a similar agent pattern combined with RAG and a vector store for grounded answers from curated documents.

Cornell Notes

The agent design uses PydanticAI-style structured outputs to turn one question into multiple web searches, run concurrently, and return a predictable markdown report. A result schema defines fields like research title, research main, and research bullets, so downstream code can rely on consistent formatting. A search tool wraps Tavily (or DuckDuckGo) and takes model-generated keywords plus a query number, while a dependency controls max results per search. The model decides how many searches to run (typically 3–5) and what queries to use. Injecting a “today’s date” dependency into the system prompt is crucial for time-sensitive questions; without it, outputs reflect the model’s cutoff (e.g., October 2023).

How does the agent keep its output consistent enough to be useful programmatically?

It defines a structured “result type” using a Pydantic data class with explicit fields: a research title, a research main section, and research bullets. The agent is instructed to fill those fields, and the final output is formatted as markdown strings. Even when the model chooses different subheadings across runs, the schema remains stable: the response is still expected to populate result.data.research_title, result.data.research_main, and result.data.research_bullets.

What controls how many web searches the agent performs, and why does that matter for cost?

The system prompt tells the model to write strong keywords to do three to five searches in total, and the model then decides the exact number within that range. Each search returns multiple results (e.g., max_results set to 3 in the example), so token volume grows quickly because the model must read and synthesize many retrieved snippets. The transcript notes total token usage (about 3,700 in one run) and suggests limiting searches or adding prompt guidance to be more economical.

Why switch from DuckDuckGo to Tavily in the example?

DuckDuckGo is described as easy to use but prone to rate limiting for searches. Tavily is used instead to avoid those rate-limit issues during development. Tavily is also described as having a paid API that still provides a thousand free calls per month, allowing experimentation without requiring a credit card.

How does asynchronous execution change the agent’s behavior?

The agent is configured to use asynchronous calls for the search tool, so multiple queries run in parallel rather than waiting for one search to finish before starting the next. In the Sam Altman bio example, the agent selects four queries and retrieves results for each concurrently, then hands the combined information back to the model for final synthesis.

What’s the practical fix for “latest news” questions returning outdated results?

Inject today’s date into the system prompt via a dependency. Without that, the agent relies on the model’s cutoff (shown as October 2023), so “latest AI news” returns stale content. With a “today” dependency formatted as year-month-day, the agent targets “the last few days,” chooses fewer searches (three in the example), and produces bullets aligned with early December 2024 items.

How does the agent decide what to search for in a single question?

The system prompt instructs it to generate strong keywords and to plan 3–5 searches, each labeled with a query number. The model then produces queries like “Sam Altman biography early life,” “Sam Altman career OpenAI Y Combinator,” and “Sam Altman achievements contributions personal life and philanthropy.” Those queries drive the tool calls, and the retrieved context is later synthesized into the structured response.

Review Questions

  1. What fields are required by the result schema, and how do they map to the final markdown output?
  2. How does adding a “today’s date” dependency change the agent’s search strategy and the freshness of its answers?
  3. Where do max_results and query count come from, and how do they jointly affect token usage?

Key Points

  1. 1

    Choose a search backend that won’t throttle development; DuckDuckGo may rate-limit without a proxy, while Tavily is used to avoid that.

  2. 2

    Define a Pydantic result schema (title, main section, bullets) so the agent returns predictable, markdown-friendly structured output.

  3. 3

    Use a tool wrapper for search that accepts model-generated query keywords and a query number, and enforce max results via dependencies.

  4. 4

    Configure asynchronous search so multiple planned queries run in parallel, then synthesize the combined retrieved context into the final schema.

  5. 5

    Control time sensitivity by injecting a formatted “today” date into the system prompt; otherwise answers can reflect the model’s cutoff (e.g., October 2023).

  6. 6

    Expect token costs to rise with multi-query retrieval; add prompt guidance or reduce search counts when cost matters.

Highlights

The agent converts one question into 3–5 model-chosen searches, runs them asynchronously, and returns a structured markdown report with a fixed schema.
Tavily is used to avoid DuckDuckGo rate limiting, and it’s described as offering a thousand free calls per month for early testing.
Without a “today” date in the system prompt, “latest AI news” drifts to cutoff-era results; injecting today’s date restores recency.
The model decides both the number of searches and the specific query keywords, while dependencies constrain tool behavior like max results.

Topics