Get AI summaries of any video or article — Sign up free
Building a LangChain Custom Medical Agent with Memory thumbnail

Building a LangChain Custom Medical Agent with Memory

Sam Witteveen·
5 min read

Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Constrain retrieval by wrapping DuckDuckGo queries with “site:WebMD.com,” so evidence comes from WebMD only.

Briefing

A LangChain “medical advice” agent can be built to answer questions using a site-restricted search (WebMD) and to carry context across multiple turns via conversation memory. The key move is combining a custom tool wrapper for “site:WebMD” queries with a custom prompt template and a custom output parser, so the model can reliably decide when to search, how to feed the query into that search, and when to stop and produce a final answer.

The system starts with a single search tool built on DuckDuckGo, but the tool is modified to constrain results to WebMD. Instead of letting search results come from any site, the wrapper prepends “site:WebMD.com” to the user’s query. That turns a generic search into a targeted retrieval step, making the agent’s downstream answers traceable to one source domain.

Next comes the agent’s prompt and scratchpad handling. The prompt instructs the model to answer as a compassionate medical professional while using the available tool(s). A custom prompt template is used to manage intermediate reasoning steps in a React-style loop: the scratchpad begins empty, then accumulates the model’s “thoughts,” “actions,” and “observations” as it iterates. Importantly, the scratchpad and tool wiring are generated dynamically by the agent framework rather than being manually supplied by the user.

Reliability hinges on the custom output parser. The parser inspects model output for a “final answer” marker; if present, it signals the agent to finish. If not, it uses regex to extract the requested tool name (“action”) and the tool input. When the output doesn’t match the expected format—such as missing an action or failing to include parseable fields—the parser returns an error like “could not parse LLM output,” a practical reminder that many open-source models may not always follow strict tool-call formatting.

The agent is then assembled into an AgentExecutor with stopping behavior tuned to the “observation” boundary. In debug mode, the workflow becomes visible: the model emits an action like “search WebMD” with an action input such as “sprained ankle treatment,” the tool returns search text, and the agent uses that observation to craft the final response. The example demonstrates a full cycle from tool selection to final answer generation.

Finally, conversation memory turns one-off Q&A into a multi-turn assistant. Without memory, follow-up questions like “what meds could I take?” would lack the earlier condition context. With ConversationBufferMemory configured to remember the last two turns, the agent receives prior user questions and its own earlier responses, allowing it to correctly interpret subsequent prompts as referring to the same condition (e.g., a sprained ankle). The resulting answers include practical guidance such as cold compresses and typical healing timelines (given as “4 to 21 days” in the example), along with safety language urging medical attention if symptoms persist or worsen.

Overall, the build pattern—site-restricted retrieval tool + prompt/scratchpad control + output parsing + optional memory—can be repurposed for Q&A over other trusted sources, including adding tools like Wikipedia lookups or custom APIs.

Cornell Notes

The agent is designed to answer medical questions using WebMD-only search results, then generate a final response after a tool-based retrieval step. It constrains DuckDuckGo queries by wrapping them with “site:WebMD.com,” so the model’s evidence comes from a single domain. A custom prompt template manages a React-style scratchpad for intermediate steps, while a custom output parser extracts the model’s intended tool (“action”) and tool input, or detects “final answer” to stop. ConversationBufferMemory adds short-term context across turns, letting follow-up questions (e.g., about medications or healing time) correctly refer to the previously discussed condition. This combination improves both relevance (source restriction) and usability (multi-turn understanding).

How does the agent ensure its answers rely on WebMD rather than general search results?

It wraps DuckDuckGo’s search tool so every query is prefixed with a site restriction: “site:WebMD.com.” The tool wrapper takes the user’s query, prepends the site filter, and then runs the search. The agent is also given a tool name like “search WebMD,” so the model’s action explicitly corresponds to WebMD-restricted retrieval.

Why is a custom output parser necessary in a tool-using agent?

The parser bridges model text output and tool execution. It checks whether the model output contains a “final answer” marker; if so, it triggers finish mode. Otherwise, it uses regex to parse out the requested tool (“action”) and the tool input. If the model output doesn’t match the expected structure—such as missing an action field—the parser returns an error like “could not parse LLM output,” preventing the agent from attempting an invalid tool call.

What role does the custom prompt template play in the agent’s reasoning loop?

The prompt template defines the React-style format and manages the agent scratchpad. The scratchpad starts empty and is filled automatically with intermediate “thought,” “action,” and “observation” steps as the agent iterates. A custom prompt template is used so intermediate steps are handled consistently, while the user only provides the question and the framework supplies the evolving scratchpad and tool context.

How does the agent decide when to stop searching and produce a final response?

Stopping is coordinated between the output parser and the agent executor’s stopping condition. The output parser looks for “final answer” in the model output; when found, it signals the agent to finish. The agent executor also stops on a boundary tied to observations (described as stopping on the newline “observation”), so tool results can be injected as observations before the model continues to the final response.

What changes when conversation memory is added, and why does it matter for follow-up questions?

Without memory, each query is treated independently, so follow-ups like “what meds could I take?” won’t automatically connect to the earlier topic (e.g., a sprained ankle). With ConversationBufferMemory configured to remember the last two turns, the agent receives prior conversation history in the prompt. That lets it interpret new questions as continuing the same medical context and answer accordingly (medications, healing time, etc.).

Review Questions

  1. What specific string manipulation turns a general DuckDuckGo search into a WebMD-only search tool?
  2. Describe how the custom output parser distinguishes between “tool call” outputs and “final answer” outputs.
  3. How does ConversationBufferMemory change the agent’s interpretation of follow-up questions?

Key Points

  1. 1

    Constrain retrieval by wrapping DuckDuckGo queries with “site:WebMD.com,” so evidence comes from WebMD only.

  2. 2

    Use a custom prompt template to manage a React-style scratchpad that accumulates thoughts, actions, and observations automatically.

  3. 3

    Implement a custom output parser that extracts tool name and tool input via regex, and stops when a “final answer” marker appears.

  4. 4

    Tune agent stopping behavior around observations so tool results are incorporated before generating the final response.

  5. 5

    Enable debug mode to trace the full loop: model action → tool input → tool output → final answer.

  6. 6

    Add ConversationBufferMemory (configured to keep the last two turns) to support multi-turn follow-ups that depend on earlier medical context.

  7. 7

    Include safety framing in the prompt (e.g., advise seeking medical attention if symptoms persist or worsen) when generating medical guidance.

Highlights

WebMD-only retrieval is achieved by prefixing every search with “site:WebMD.com,” turning a general search tool into a trusted-source lookup.
The custom output parser is the reliability hinge: it either extracts an “action” + input for tool use or detects “final answer” to end the loop.
ConversationBufferMemory lets follow-ups like “what meds could I take?” correctly refer to the previously discussed condition (sprained ankle).
In the example, the agent’s healing-time answer is given as “4 to 21 days,” paired with practical guidance like reducing swelling and using cold compresses.

Topics

Mentioned