Guardrails with LangChain: A Complete Crash Course for Building Safe AI Agents

TL;DR

Guardrails control what an AI agent processes and returns by enforcing safe inputs, approved actions, and validated outputs across the pipeline.

Briefing Cornell Notes

Briefing

Safe AI agents rely on guardrails that control what enters and exits an LLM-driven workflow. In practice, guardrails sit around the agent pipeline—before inputs reach the model, during tool calls, and after outputs are generated—so the system only processes appropriate requests, performs approved actions, and returns outputs that meet defined safety rules. The need becomes obvious when an agent is asked for harmful instructions like “how to hack a server” or for disallowed content such as generating deceptive or unsafe images. Without guardrails, the model may comply with unsafe prompts; with them, the workflow can block, redact, or require review before any risky step happens.

The crash course breaks guardrails into two core implementation strategies: deterministic and model-based. Deterministic guardrails use rule-based logic such as keyword matching or rejection rules. They are fast and incur zero additional LLM cost, but they struggle with semantics—meaning they can miss unsafe intent that doesn’t contain obvious keywords. Model-based guardrails delegate safety judgment to an LLM, prompting it to classify content as “safe” or “unsafe.” This approach better captures meaning and context, but it adds cost because it requires extra model calls for each input.

To implement these ideas concretely, the walkthrough uses LangChain, emphasizing how guardrails can be added as middleware hooks in an agent workflow. LangChain’s middleware approach lets developers intercept execution at multiple points: before the agent runs, around tool calls, and after the agent produces a response. Several built-in guardrail types are highlighted.

First is PII detection middleware, designed to prevent personal data leakage. It can detect common PII categories including email addresses, credit cards, IPs, MAC addresses, and URLs. Depending on the configured strategy, it can redact (replace with a placeholder format), mask (star out parts), hash (transform via hashing), or block (raise an exception) when sensitive patterns are detected. Importantly, this middleware can apply to inputs, outputs, and even tool calls—so sensitive data doesn’t slip through intermediate steps.

Second is human-in-the-loop middleware, which pauses agent execution before sensitive operations and waits for explicit approval or rejection. The course frames this as essential for high-impact actions such as financial transactions, sending emails externally, or deleting production data. A checkpointer is used to track state per user/session (via threads and checkpointing), and the workflow resumes only after an approval command is issued.

Third are custom guardrails using before-agent and after-agent hooks. A before-agent hook acts as an input filter—blocking requests containing banned keywords or performing checks like authentication, rate limiting, or category-based request blocking. An after-agent hook performs output validation, including compliance scanning and safety evaluation; unsafe responses can be suppressed or replaced, while safe ones pass through.

Finally, the course shows layered guardrails, stacking multiple middleware components in sequence—such as content filtering, PII protection, human approval, and model-based safety checks—so complex safety requirements are enforced end-to-end. The session closes by pointing to a real-world healthcare chatbot example that combines these guardrails in a practical setting, with a promise of a deeper dive later.

Cornell Notes

Guardrails are safety mechanisms that control what an AI agent accepts and returns. They wrap an agent pipeline so inputs are screened, tool calls are constrained, and outputs are validated before users see them. The course contrasts two approaches: deterministic guardrails (rule/keyword based, zero extra LLM cost but weaker semantic understanding) and model-based guardrails (LLM classification of “safe/unsafe,” better semantics but added cost). Using LangChain middleware, guardrails can be applied at key hook points: PII detection (redact/mask/hash/block for email, credit cards, IPs, MACs, URLs), human-in-the-loop approval for sensitive actions, and custom before/after-agent hooks for filtering and compliance scanning. Layered guardrails stack these protections for stronger end-to-end safety.

Why do guardrails need to sit around the entire agent pipeline rather than only checking the final answer?

Because risk can appear at multiple stages: a harmful prompt can reach the LLM, sensitive data can leak through tool calls, and unsafe content can be generated in the final response. The course’s definition emphasizes guardrails that control “what goes into and comes out of an AI agent,” ensuring safe inputs, approved actions, and validated outputs. LangChain middleware supports this by applying protections before the agent runs, during tool execution, and after outputs are produced.

How do deterministic and model-based guardrails differ in cost and safety coverage?

Deterministic guardrails rely on rule-based algorithms like keyword matching or rejection lists. They have zero LLM cost because they don’t call a model for classification, but they may miss unsafe intent that doesn’t include obvious keywords. Model-based guardrails use an LLM to evaluate content safety (e.g., prompting for “safe” vs “unsafe”). They better capture semantics and context, but they add cost because each input requires an extra LLM call.

What does LangChain’s PII middleware protect against, and what actions can it take?

PII middleware detects personal identifiable information such as email addresses, credit cards, IPs, MAC addresses, and URLs. It supports strategies like redact (replace with a formatted placeholder), mask (use stars), hash (apply a hashing algorithm), and block (raise an exception). The course also notes that it can apply to inputs, outputs, and tool calls, reducing the chance that sensitive data leaks mid-workflow.

When is human-in-the-loop middleware most appropriate, and how does it work operationally?

It’s best for operations with significant business impact—financial transactions, sending emails to external parties, deleting production data, or other sensitive actions. The middleware pauses execution before the sensitive tool runs and waits for human approval or rejection. A checkpointer (threads + session ID) tracks which user/workflow is being paused, and the agent resumes only after an approval command is issued.

How do before-agent and after-agent hooks enable custom safety logic?

A before-agent hook runs before any LLM call, functioning like an input filter. The course demonstrates a custom middleware that blocks requests containing banned keywords (e.g., detecting “hack” and returning a message like “I cannot process this request…”). An after-agent hook validates the final response before it reaches the user; it can evaluate safety/compliance and suppress unsafe outputs or allow safe ones to pass. This supports both deterministic filtering and model-based compliance scanning.

What does “layered guardrails” mean in practice?

Layered guardrails stack multiple middleware components in sequence so different protections apply together. The course’s example layers content filtering, PII middleware, human-in-the-loop approval, and a model-based safety guard for output evaluation. The result is end-to-end enforcement: sensitive data is handled, risky actions require approval, and final responses are checked for safety.

Review Questions

What trade-offs arise when choosing deterministic vs model-based guardrails for an agent?
Describe how PII middleware can prevent leakage across inputs, outputs, and tool calls.
Give one example of a before-agent hook use case and one example of an after-agent hook use case.

Key Points

1
Guardrails control what an AI agent processes and returns by enforcing safe inputs, approved actions, and validated outputs across the pipeline.
2
Deterministic guardrails use rule-based checks (e.g., keyword matching) with zero extra LLM cost but weaker semantic understanding.
3
Model-based guardrails use an LLM to classify safety (“safe/unsafe”), improving semantic coverage at the expense of additional LLM calls.
4
LangChain middleware enables guardrails at multiple hook points, including before-agent, around tool calls, and after-agent validation.
5
PII detection middleware can detect email, credit cards, IPs, MAC addresses, and URLs and apply strategies such as redact, mask, hash, or block.
6
Human-in-the-loop middleware pauses execution before sensitive operations and resumes only after explicit approval or rejection, using checkpointing to track sessions.
7
Layered guardrails combine multiple middleware protections in sequence to enforce safety end-to-end.

Highlights

Guardrails are positioned around the agent pipeline so safety checks happen before LLM calls, during tool execution, and before final output reaches users.

Deterministic guardrails are cheap and fast but can miss unsafe intent that lacks obvious keywords; model-based guardrails catch semantics but add per-input LLM cost.

LangChain’s PII middleware can redact, mask, hash, or block detected personal data—and can apply to inputs, outputs, and tool calls.

Human-in-the-loop guardrails require approval before sensitive actions like sending emails or deleting records, preventing irreversible mistakes.

Layered guardrails stack content filtering, PII protection, human approval, and model-based safety evaluation for stronger coverage.

Topics

Guardrails
LangChain Middleware
PII Protection
Human-in-the-Loop
Safety Hooks

Mentioned

Krish Naik