Get AI summaries of any video or article — Sign up free
What Is Agentic RAG? thumbnail

What Is Agentic RAG?

Krish Naik·
5 min read

Based on Krish Naik's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Traditional RAG retrieves context from a single vector database, then uses the LLM mainly to generate from that context plus a prompt.

Briefing

Agentic RAG upgrades traditional retrieval-augmented generation by adding an intelligent routing layer that decides which knowledge base to consult before the LLM generates an answer. In a standard RAG setup, a user query is embedded, sent to a single vector database to fetch relevant context, and then the LLM uses that context (plus a prompt) to produce the final response. The LLM’s job is largely limited to summarizing or generating based on retrieved text, because the system assumes one fixed source of truth.

The transcript contrasts that fixed flow with an agentic workflow built for messy, real-world knowledge. Instead of one vector database, the system can maintain multiple specialized stores—for example, one holding Udemy course content and another holding live-course materials. When a user asks for “recent Udemy courses on agentic AI,” the key problem becomes deciding whether the query should hit the Udemy database or the live-courses database. In traditional RAG, that decision would typically be hard-coded or handled outside the retrieval step. In agentic RAG, the LLM is wrapped into an agent that can dynamically route the query to the most relevant retriever tool (each tool corresponds to a vector database).

That routing step is the central difference. The agent is integrated with multiple retriever tools, so it can choose DB2 for Udemy-related questions and DB3 for live-course questions. Once the agent selects the right database, retrieval returns more accurate context, and the LLM then summarizes that context into the final answer. The transcript emphasizes that this improves accuracy because the system avoids forcing every query through the same retrieval pipeline when different queries map better to different knowledge sources.

The explanation also frames agentic RAG as a framework that enhances traditional RAG by incorporating intelligent agents to handle complex tasks and make decisions dynamically. It highlights a practical implementation pattern using LangGraph-style routing workflows: a request enters the workflow, the agent checks relevance against the connected vector store, and the system can branch to different retrieval paths. If the needed information is missing from one vector database, the workflow can continue to other conditions or fall back to the LLM’s own capabilities, with additional routing rules controlling what happens next.

Finally, the transcript positions agentic RAG as a scalable approach. As more specialized vector databases are added—such as company policy documents or other domain corpora—each becomes a retriever tool the agent can select from. The result is a more modular system where the agent chooses the best context source per query, rather than relying on a single retrieval store for every question.

Cornell Notes

Traditional RAG takes a user query, retrieves relevant context from one vector database, and then uses the LLM mainly to summarize or generate an answer from that retrieved text. Agentic RAG keeps the retrieval-and-summarize pattern but adds an agent that can route each query to the most appropriate retriever tool (i.e., the right vector database). With multiple stores—such as one for Udemy courses and another for live courses—the agent decides whether the query should go to DB2 or DB3, improving the accuracy of retrieved context. Routing can be implemented with a workflow (e.g., LangGraph-style) that checks relevance and applies conditions or fallbacks when information is missing. This makes the system more scalable as more specialized databases are added.

How does a traditional RAG system handle a user query end to end?

A user query enters an LLM application along with a prompt (instruction for how the LLM should behave). In a traditional RAG setup, the query is sent to a single vector database (vector DB) that stores embedded text from a specific knowledge source. The database returns relevant context, which is combined with the prompt and then passed to the LLM. The LLM is used primarily to summarize or generate the final output based on that retrieved context.

Why does the transcript say the LLM is used only once in traditional RAG?

In the described flow, retrieval happens first: the query goes to the vector database, which returns context. After that, the LLM is invoked to generate the answer using the retrieved context plus the prompt. The LLM’s role is therefore concentrated on the final generation step rather than on making retrieval-source decisions.

What changes in agentic RAG when multiple vector databases exist?

Instead of forcing all queries through one vector database, agentic RAG uses an agent that can route the query to the correct retriever tool. For example, DB2 contains Udemy information while DB3 contains live-course information. When a query asks for “Udemy courses on agentic AI,” the agent selects DB2; when the query targets live courses, it selects DB3. This routing improves the accuracy of the retrieved context before the LLM summarizes.

How does the agent decide which database to query?

The transcript describes converting the LLM into an agent integrated with multiple retriever tools. The agent uses the LLM’s capabilities to choose the appropriate tool based on the query’s intent and expected content. In effect, it performs dynamic selection—routing to DB2 for Udemy-related questions and to DB3 for live-course-related questions—so the system retrieves the most relevant documents.

What happens when the requested information is not present in the selected vector database?

A routing workflow can include conditions and fallbacks. The transcript’s example describes an agent checking relevance in the connected vector store; if the data isn’t found, execution continues with additional routing rules. The LLM can still generate an answer even when retrieval fails, and the workflow can be extended with more routers or logic to handle missing information.

Why does the transcript argue that splitting knowledge into specialized vector databases can improve results?

When each vector database is focused on a specific domain (e.g., Udemy courses vs. live courses vs. company policy), the agent can route each query to the most relevant store. That targeted retrieval yields more accurate context, which in turn leads to more accurate final outputs. The approach also scales: new specialized databases can be added as additional retriever tools.

Review Questions

  1. In traditional RAG, what is the sequence of operations from query to final answer, and where does the LLM fit in?
  2. In agentic RAG, what problem does routing solve, and how does routing affect retrieval context quality?
  3. How can a routing workflow handle cases where the selected vector database lacks the needed information?

Key Points

  1. 1

    Traditional RAG retrieves context from a single vector database, then uses the LLM mainly to generate from that context plus a prompt.

  2. 2

    Agentic RAG adds an agent layer that dynamically routes each query to the most relevant retriever tool (vector database).

  3. 3

    Multiple specialized vector databases (e.g., Udemy vs. live courses) can improve answer accuracy by ensuring the right context source is queried.

  4. 4

    Routing decisions are made by an LLM-powered agent integrated with retriever tools, reducing the need for manual hard-coding.

  5. 5

    Workflow-based routing (e.g., LangGraph-style) can include relevance checks, conditional branches, and fallbacks when information is missing.

  6. 6

    As more domains are added (like company policy), each new vector database can be registered as a retriever tool the agent can choose from.

Highlights

The core upgrade from traditional RAG to agentic RAG is not new retrieval—it’s dynamic routing to the right vector database before generation.
In the example, an agent routes Udemy-related queries to DB2 and live-course queries to DB3, improving the relevance of retrieved context.
A routing workflow can continue execution when a vector store lacks the requested information, using additional conditions or LLM fallback behavior.

Topics

Mentioned