5-Getting Started With Agentic RAG With Detailed Implementation Using LangGraph

TL;DR

Agentic RAG makes retrieval conditional by using an autonomous decision layer rather than always running a fixed retrieve-then-generate pipeline.

Briefing Cornell Notes

Briefing

Agentic RAG shifts retrieval from a fixed pipeline to a decision made on the fly: an autonomous agent chooses when to fetch context, what to fetch, and how many retrievals to perform before an LLM generates an answer. In traditional RAG, a query always triggers retrieval from a vector database, then the retrieved context is stuffed into a prompt for generation. Agentic RAG keeps the same core ingredients—retrieval augmented generation (RAG) plus an autonomous agent—but adds dynamic control so the system can skip retrieval when it’s unnecessary or retrieve only when the question demands it.

The implementation described builds an agentic workflow in LangGraph with three nodes. First, a “decide” node determines whether retrieval is needed. In the example, this decision is implemented with simple Python keyword rules: if the question contains terms like “what,” “how,” “explain,” “describe,” or “tell,” the workflow sets a boolean flag (needs_retrieval) to true; otherwise it stays false. Second, a “retrieve” node calls a retriever backed by a vector store (using the Fires vector store in the code) to fetch relevant documents. Third, a “generate” node uses an LLM (configured with GPT 4.1) to produce the final answer. If documents exist, the prompt includes the retrieved context; if not, the prompt relies on the LLM without retrieved documents.

LangGraph’s conditional edges are the mechanism that turns the boolean decision into branching behavior. The graph starts at the decide node, then routes either to the retrieve node (when needs_retrieval is true) or directly to the generate node (when it’s false). After retrieval, the workflow flows into generation, and generation ends the run. This structure mirrors the conceptual difference: the retrieval step becomes optional and controlled by the agent rather than mandatory.

The walkthrough also covers the practical setup needed to run the workflow. It installs LangGraph and LangChain components (including langchain-openai and langgraph), loads an OpenAI API key from environment variables, and defines a typed state object (AgentState) holding the question, a list of retrieved documents, the final answer, and the needs_retrieval flag. For demonstration, it creates sample text, converts it into Document objects, and builds a retriever from those documents. Two test queries—“What is langgraph?” and “How does rag work?”—show the branching in action: the first triggers retrieval and returns an answer with retrieved context, while the second retrieves multiple documents (four in the example) and then generates a response grounded in that context.

Overall, the core takeaway is that agentic RAG operationalizes autonomy by making retrieval a controllable decision inside the workflow, implemented through LangGraph nodes, state, and conditional routing. That design is positioned as the foundation for more advanced autonomous RAG systems across different use cases, where retrieval strategy can vary per question rather than follow a single fixed path.

Cornell Notes

Agentic RAG adds an autonomous decision layer on top of traditional RAG. Instead of always retrieving from a vector database, a “decide” node sets a needs_retrieval flag based on the question, then LangGraph routes execution either to a retrieval node or straight to generation. When retrieval runs, a retriever (backed by a vector store such as Fires) fetches documents and the generate node prompts GPT 4.1 with that context. When retrieval is skipped, the generate node prompts the LLM without retrieved documents. This matters because it makes retrieval dynamic—potentially reducing unnecessary context injection and tailoring the workflow to each query.

What makes agentic RAG different from traditional RAG in practice?

Traditional RAG follows a fixed pipeline: query → retrieve context from a vector DB → combine context with a prompt → generate. Agentic RAG inserts an autonomous decision step that determines whether retrieval should happen at all. In the example workflow, the decide node sets needs_retrieval to true/false, and conditional edges route to either the retrieve node or directly to the generate node.

Which decisions does the autonomous agent need to make for agentic RAG to work well?

The transcript highlights five decision points: when to retrieve, what to retrieve, where to retrieve, how many times to retrieve, and how to use the retrieved context in the prompt. In the implementation shown, only the “when to retrieve” decision is modeled explicitly via keyword rules, but the workflow structure is set up to support richer decision logic later (e.g., LLM-based routing).

How does the LangGraph workflow implement branching between retrieval and generation?

LangGraph uses conditional edges. After the start node reaches the decide node, the graph calls a function (should retrieve) that checks state.needs_retrieval. If true, it routes to the retrieve node; if false, it routes to the generate node. After retrieval, the graph connects retrieve → generate → end, and generation also ends when reached directly.

What data does the workflow keep in state, and why?

The AgentState dictionary stores question (str), documents (list of Document), answer (str), and needs_retriever (boolean). This state is passed between nodes so the decide node can influence routing, the retrieve node can populate documents, and the generate node can condition its prompt on whether documents are present.

How is retrieval performed in the example implementation?

The code builds a vector store retriever from sample text by converting text into Document objects and then creating a retriever from Fires. The retrieve_documents function calls retriever.invoke(question) to fetch relevant documents, which are stored back into state.documents for use in the generation prompt.

How does the generate node change its prompt depending on retrieval results?

If state.documents contains retrieved context, the prompt includes the context and asks the LLM to answer the question using that information. If documents are empty, the prompt omits context and asks the LLM to answer directly. This conditional prompt behavior matches the goal of making retrieval optional and question-dependent.

Review Questions

In the provided LangGraph workflow, what condition determines whether the retrieve node is executed?
Describe the sequence of nodes and edges from start to end for both cases: needs_retrieval = true and needs_retrieval = false.
What fields are stored in AgentState, and how does each field get used by different nodes?

Key Points

1
Agentic RAG makes retrieval conditional by using an autonomous decision layer rather than always running a fixed retrieve-then-generate pipeline.
2
An autonomous agent’s key job is deciding when to retrieve, what/where/how much to retrieve, and how to feed retrieved context into the LLM prompt.
3
LangGraph implements the agentic control flow using nodes (decide, retrieve, generate) and conditional edges based on a boolean flag in shared state.
4
The example uses a typed AgentState holding question, documents, answer, and needs_retriever to coordinate decisions and data passing across nodes.
5
Retrieval is performed via a vector store retriever (Fires in the example) using retriever.invoke(question).
6
The generate node prompts GPT 4.1 with retrieved context when documents exist, and prompts without context when retrieval is skipped.
7
Testing demonstrates branching behavior: questions trigger retrieval and return multiple documents when needs_retrieval is true.

Highlights

Agentic RAG’s core shift is turning retrieval into a decision: the system can route to generation without fetching context when it’s not needed.

Conditional edges in LangGraph translate a boolean needs_retrieval flag into real branching between retrieve and generate nodes.

The workflow’s state object (question, documents, answer, needs_retriever) is the glue that lets the decide node control downstream behavior.

In the demo, GPT 4.1 generates answers grounded in retrieved documents when retrieval runs, and answers directly when it doesn’t.

Topics

Agentic RAG
LangGraph Workflow
Conditional Routing
Retrieval Augmented Generation
Python Implementation

Mentioned

Krishna Nayak
RAG
LLM
GPT