LangChain Agents - Joining Tools and Chains with Decisions

TL;DR

LangChain agents choose tools at runtime by using an executor prompt that maps the user’s question to the best next action.

Briefing Cornell Notes

Briefing

LangChain agents let a language model choose—at runtime—which tools to use (or whether to use any tools at all) to answer a user’s question. Instead of forcing every query through a single tool or a fixed chain, an agent runs an “executor” step that reads the input, decides the best next action based on the tools it was initialized with, and then iterates through tool calls until it can produce a final response. That decision-and-action loop is what turns a collection of tools (search, calculator, Wikipedia, terminal) into a flexible assistant that can handle mixed questions.

The walkthrough starts with an agent configured with two tools: a search tool backed by a search API and a calculator tool for math. The agent is initialized with a language model (OpenAI, temperature set to 0) and an agent type called “zero-shot react.” Initialization also constructs the prompt the model uses to plan: it’s instructed that it has access to specific tools, must follow a structured format, and should generate intermediate “thought” steps that determine which action to take and what input to send to that action. When prompted with a casual question like “How are you today?”, the agent produces a direct response without calling any tool. When asked a question requiring external facts and computation—such as “Who is the United States president? What is his current age divided by 2?”—the agent uses search to identify the president (Joe Biden), uses search again to obtain Biden’s age (80), then calls the calculator to compute 80/2 (40), and finally returns the composed result.

The agent’s ability to string multiple tools together becomes clearer with a compound query: finding the average age in the United States and comparing it to the current U.S. president’s age. It searches for the needed statistics, searches again for the president’s age, then synthesizes the final comparison.

Adding more tools expands what the agent can do. A second agent includes Wikipedia and a terminal tool in addition to search and calculator. For example, it looks up “head of DeepMind” via search, then uses Wikipedia to define “DeepMind” and identify it as a British AI research lab founded in 2010 and a subsidiary of Alphabet. When asked to add 50 years to DeepMind’s founding year and predict whether AGI will exist by then, it retrieves the founding year from Wikipedia and then performs the arithmetic without invoking the calculator tool—suggesting that some simple computations can be handled directly by the language model. When asked for DeepMind’s office location, it again uses search, and then can perform math on the retrieved address (squaring the street number) by calling the calculator.

The terminal tool demonstrates another capability: inspecting the local environment. Asked what files exist in the current directory, the agent runs an `ls` command, navigates to a sample data folder, and lists available files. It can also search within files using commands like `grep` to answer questions such as whether a file about “California” exists. The walkthrough ends with a caution: terminal access can be dangerous if it allows destructive commands, so it should be added carefully and ideally constrained to safe operations.

Cornell Notes

LangChain agents decide which tools to use on the fly, based on the user’s question and the tools they were initialized with. Using an executor prompt (with a “zero-shot react” style), the model can either answer directly (no tools) or run a sequence of tool calls. Examples include searching for the U.S. president, searching for the president’s age, then using the calculator to compute a derived value. Expanding the toolset with Wikipedia and terminal enables fact lookups and local file inspection via commands like `ls` and `grep`. This flexibility comes with risk: terminal access must be handled carefully to avoid destructive actions.

What problem do agents solve compared with using only fixed tools or fixed chains?

Agents avoid forcing every question through the same tool (like always using a calculator) or a rigid chain. Instead, the agent reads the input, selects the most appropriate action from the tools it has been initialized with, and repeats until it can produce a final answer. That runtime decision-making is what lets one assistant handle casual conversation, web lookups, and multi-step computations.

How does the “zero-shot react” agent initialization influence tool use?

Initialization passes in the language model, the tool list, and an agent type described as “zero-shot react.” A key output of initialization is the executor prompt that tells the model it has access to specific tools (e.g., search and calculator) and to follow a structured format: determine the question, decide an action, provide action input, observe results, and continue until a final answer is ready.

Why did the agent not call tools for “How are you today?” but did for the president/age question?

“How are you today?” can be answered directly without external facts or computation, so the agent generates a response without tool calls. The president/age question requires factual lookup (who the president is, and the president’s age) and then arithmetic (dividing by 2), so it uses search first and then the calculator.

What changes when Wikipedia and terminal are added to the toolset?

With Wikipedia added, the agent can retrieve structured background facts (e.g., DeepMind’s founding year and relationship to Alphabet). With terminal added, it can inspect the local runtime environment—running `ls` to list files and using `grep` to search file contents—so it can answer questions about what’s present in the current directory or within specific files.

What safety issue comes with the terminal tool?

Terminal access can enable destructive commands. The walkthrough warns that adding terminal without safeguards could let an agent wipe files on a user’s hard drive just by being asked. The practical takeaway is to constrain what commands are allowed and be cautious when enabling terminal capabilities.

Review Questions

In what situations should an agent answer directly without tools, and how can you tell from the question?
Describe a multi-step tool sequence from the examples (which tools were called in what order) and what each step contributed.
What safeguards would you add before enabling a terminal tool for an agent?

Key Points

1
LangChain agents choose tools at runtime by using an executor prompt that maps the user’s question to the best next action.
2
Agent initialization requires a language model, a set of tools, and an agent type such as “zero-shot react,” which defines the planning/action format.
3
A two-tool agent (search + calculator) can answer mixed questions by chaining multiple tool calls, such as searching for a fact and then computing a derived value.
4
Adding Wikipedia expands factual retrieval, while adding terminal enables local environment inspection using commands like `ls` and `grep`.
5
Some simple arithmetic may be performed directly by the language model even when a calculator tool is available.
6
Terminal tool access is high-risk; it should be restricted to safe, non-destructive operations to prevent accidental data loss.

Highlights

Agents don’t just run a fixed pipeline—they decide whether to call tools and which ones to call next based on the question.

The executor prompt instructs the model to produce structured action steps (action, action input, observation) until it can output a final answer.

With terminal enabled, the agent can answer questions about local files by running `ls` and searching contents with `grep`.

Terminal access must be treated as a security boundary; without constraints, it could execute destructive commands. 

Topics

LangChain Agents
Tool Selection
Zero-Shot React
Wikipedia Retrieval
Terminal Tool Safety