Guardrails with LangChain: A Complete Crash Course for Building Safe AI Agents
Based on Krish Naik's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Guardrails control what an AI agent processes and returns by enforcing safe inputs, approved actions, and validated outputs across the pipeline.
Briefing
Safe AI agents rely on guardrails that control what enters and exits an LLM-driven workflow. In practice, guardrails sit around the agent pipeline—before inputs reach the model, during tool calls, and after outputs are generated—so the system only processes appropriate requests, performs approved actions, and returns outputs that meet defined safety rules. The need becomes obvious when an agent is asked for harmful instructions like “how to hack a server” or for disallowed content such as generating deceptive or unsafe images. Without guardrails, the model may comply with unsafe prompts; with them, the workflow can block, redact, or require review before any risky step happens.
The crash course breaks guardrails into two core implementation strategies: deterministic and model-based. Deterministic guardrails use rule-based logic such as keyword matching or rejection rules. They are fast and incur zero additional LLM cost, but they struggle with semantics—meaning they can miss unsafe intent that doesn’t contain obvious keywords. Model-based guardrails delegate safety judgment to an LLM, prompting it to classify content as “safe” or “unsafe.” This approach better captures meaning and context, but it adds cost because it requires extra model calls for each input.
To implement these ideas concretely, the walkthrough uses LangChain, emphasizing how guardrails can be added as middleware hooks in an agent workflow. LangChain’s middleware approach lets developers intercept execution at multiple points: before the agent runs, around tool calls, and after the agent produces a response. Several built-in guardrail types are highlighted.
First is PII detection middleware, designed to prevent personal data leakage. It can detect common PII categories including email addresses, credit cards, IPs, MAC addresses, and URLs. Depending on the configured strategy, it can redact (replace with a placeholder format), mask (star out parts), hash (transform via hashing), or block (raise an exception) when sensitive patterns are detected. Importantly, this middleware can apply to inputs, outputs, and even tool calls—so sensitive data doesn’t slip through intermediate steps.
Second is human-in-the-loop middleware, which pauses agent execution before sensitive operations and waits for explicit approval or rejection. The course frames this as essential for high-impact actions such as financial transactions, sending emails externally, or deleting production data. A checkpointer is used to track state per user/session (via threads and checkpointing), and the workflow resumes only after an approval command is issued.
Third are custom guardrails using before-agent and after-agent hooks. A before-agent hook acts as an input filter—blocking requests containing banned keywords or performing checks like authentication, rate limiting, or category-based request blocking. An after-agent hook performs output validation, including compliance scanning and safety evaluation; unsafe responses can be suppressed or replaced, while safe ones pass through.
Finally, the course shows layered guardrails, stacking multiple middleware components in sequence—such as content filtering, PII protection, human approval, and model-based safety checks—so complex safety requirements are enforced end-to-end. The session closes by pointing to a real-world healthcare chatbot example that combines these guardrails in a practical setting, with a promise of a deeper dive later.
Cornell Notes
Guardrails are safety mechanisms that control what an AI agent accepts and returns. They wrap an agent pipeline so inputs are screened, tool calls are constrained, and outputs are validated before users see them. The course contrasts two approaches: deterministic guardrails (rule/keyword based, zero extra LLM cost but weaker semantic understanding) and model-based guardrails (LLM classification of “safe/unsafe,” better semantics but added cost). Using LangChain middleware, guardrails can be applied at key hook points: PII detection (redact/mask/hash/block for email, credit cards, IPs, MACs, URLs), human-in-the-loop approval for sensitive actions, and custom before/after-agent hooks for filtering and compliance scanning. Layered guardrails stack these protections for stronger end-to-end safety.
Why do guardrails need to sit around the entire agent pipeline rather than only checking the final answer?
How do deterministic and model-based guardrails differ in cost and safety coverage?
What does LangChain’s PII middleware protect against, and what actions can it take?
When is human-in-the-loop middleware most appropriate, and how does it work operationally?
How do before-agent and after-agent hooks enable custom safety logic?
What does “layered guardrails” mean in practice?
Review Questions
- What trade-offs arise when choosing deterministic vs model-based guardrails for an agent?
- Describe how PII middleware can prevent leakage across inputs, outputs, and tool calls.
- Give one example of a before-agent hook use case and one example of an after-agent hook use case.
Key Points
- 1
Guardrails control what an AI agent processes and returns by enforcing safe inputs, approved actions, and validated outputs across the pipeline.
- 2
Deterministic guardrails use rule-based checks (e.g., keyword matching) with zero extra LLM cost but weaker semantic understanding.
- 3
Model-based guardrails use an LLM to classify safety (“safe/unsafe”), improving semantic coverage at the expense of additional LLM calls.
- 4
LangChain middleware enables guardrails at multiple hook points, including before-agent, around tool calls, and after-agent validation.
- 5
PII detection middleware can detect email, credit cards, IPs, MAC addresses, and URLs and apply strategies such as redact, mask, hash, or block.
- 6
Human-in-the-loop middleware pauses execution before sensitive operations and resumes only after explicit approval or rejection, using checkpointing to track sessions.
- 7
Layered guardrails combine multiple middleware protections in sequence to enforce safety end-to-end.