Building a LangChain Custom Medical Agent with Memory
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Constrain retrieval by wrapping DuckDuckGo queries with “site:WebMD.com,” so evidence comes from WebMD only.
Briefing
A LangChain “medical advice” agent can be built to answer questions using a site-restricted search (WebMD) and to carry context across multiple turns via conversation memory. The key move is combining a custom tool wrapper for “site:WebMD” queries with a custom prompt template and a custom output parser, so the model can reliably decide when to search, how to feed the query into that search, and when to stop and produce a final answer.
The system starts with a single search tool built on DuckDuckGo, but the tool is modified to constrain results to WebMD. Instead of letting search results come from any site, the wrapper prepends “site:WebMD.com” to the user’s query. That turns a generic search into a targeted retrieval step, making the agent’s downstream answers traceable to one source domain.
Next comes the agent’s prompt and scratchpad handling. The prompt instructs the model to answer as a compassionate medical professional while using the available tool(s). A custom prompt template is used to manage intermediate reasoning steps in a React-style loop: the scratchpad begins empty, then accumulates the model’s “thoughts,” “actions,” and “observations” as it iterates. Importantly, the scratchpad and tool wiring are generated dynamically by the agent framework rather than being manually supplied by the user.
Reliability hinges on the custom output parser. The parser inspects model output for a “final answer” marker; if present, it signals the agent to finish. If not, it uses regex to extract the requested tool name (“action”) and the tool input. When the output doesn’t match the expected format—such as missing an action or failing to include parseable fields—the parser returns an error like “could not parse LLM output,” a practical reminder that many open-source models may not always follow strict tool-call formatting.
The agent is then assembled into an AgentExecutor with stopping behavior tuned to the “observation” boundary. In debug mode, the workflow becomes visible: the model emits an action like “search WebMD” with an action input such as “sprained ankle treatment,” the tool returns search text, and the agent uses that observation to craft the final response. The example demonstrates a full cycle from tool selection to final answer generation.
Finally, conversation memory turns one-off Q&A into a multi-turn assistant. Without memory, follow-up questions like “what meds could I take?” would lack the earlier condition context. With ConversationBufferMemory configured to remember the last two turns, the agent receives prior user questions and its own earlier responses, allowing it to correctly interpret subsequent prompts as referring to the same condition (e.g., a sprained ankle). The resulting answers include practical guidance such as cold compresses and typical healing timelines (given as “4 to 21 days” in the example), along with safety language urging medical attention if symptoms persist or worsen.
Overall, the build pattern—site-restricted retrieval tool + prompt/scratchpad control + output parsing + optional memory—can be repurposed for Q&A over other trusted sources, including adding tools like Wikipedia lookups or custom APIs.
Cornell Notes
The agent is designed to answer medical questions using WebMD-only search results, then generate a final response after a tool-based retrieval step. It constrains DuckDuckGo queries by wrapping them with “site:WebMD.com,” so the model’s evidence comes from a single domain. A custom prompt template manages a React-style scratchpad for intermediate steps, while a custom output parser extracts the model’s intended tool (“action”) and tool input, or detects “final answer” to stop. ConversationBufferMemory adds short-term context across turns, letting follow-up questions (e.g., about medications or healing time) correctly refer to the previously discussed condition. This combination improves both relevance (source restriction) and usability (multi-turn understanding).
How does the agent ensure its answers rely on WebMD rather than general search results?
Why is a custom output parser necessary in a tool-using agent?
What role does the custom prompt template play in the agent’s reasoning loop?
How does the agent decide when to stop searching and produce a final response?
What changes when conversation memory is added, and why does it matter for follow-up questions?
Review Questions
- What specific string manipulation turns a general DuckDuckGo search into a WebMD-only search tool?
- Describe how the custom output parser distinguishes between “tool call” outputs and “final answer” outputs.
- How does ConversationBufferMemory change the agent’s interpretation of follow-up questions?
Key Points
- 1
Constrain retrieval by wrapping DuckDuckGo queries with “site:WebMD.com,” so evidence comes from WebMD only.
- 2
Use a custom prompt template to manage a React-style scratchpad that accumulates thoughts, actions, and observations automatically.
- 3
Implement a custom output parser that extracts tool name and tool input via regex, and stops when a “final answer” marker appears.
- 4
Tune agent stopping behavior around observations so tool results are incorporated before generating the final response.
- 5
Enable debug mode to trace the full loop: model action → tool input → tool output → final answer.
- 6
Add ConversationBufferMemory (configured to keep the last two turns) to support multi-turn follow-ups that depend on earlier medical context.
- 7
Include safety framing in the prompt (e.g., advise seeking medical attention if symptoms persist or worsen) when generating medical guidance.