What Is Agentic RAG?
Based on Krish Naik's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Traditional RAG retrieves context from a single vector database, then uses the LLM mainly to generate from that context plus a prompt.
Briefing
Agentic RAG upgrades traditional retrieval-augmented generation by adding an intelligent routing layer that decides which knowledge base to consult before the LLM generates an answer. In a standard RAG setup, a user query is embedded, sent to a single vector database to fetch relevant context, and then the LLM uses that context (plus a prompt) to produce the final response. The LLM’s job is largely limited to summarizing or generating based on retrieved text, because the system assumes one fixed source of truth.
The transcript contrasts that fixed flow with an agentic workflow built for messy, real-world knowledge. Instead of one vector database, the system can maintain multiple specialized stores—for example, one holding Udemy course content and another holding live-course materials. When a user asks for “recent Udemy courses on agentic AI,” the key problem becomes deciding whether the query should hit the Udemy database or the live-courses database. In traditional RAG, that decision would typically be hard-coded or handled outside the retrieval step. In agentic RAG, the LLM is wrapped into an agent that can dynamically route the query to the most relevant retriever tool (each tool corresponds to a vector database).
That routing step is the central difference. The agent is integrated with multiple retriever tools, so it can choose DB2 for Udemy-related questions and DB3 for live-course questions. Once the agent selects the right database, retrieval returns more accurate context, and the LLM then summarizes that context into the final answer. The transcript emphasizes that this improves accuracy because the system avoids forcing every query through the same retrieval pipeline when different queries map better to different knowledge sources.
The explanation also frames agentic RAG as a framework that enhances traditional RAG by incorporating intelligent agents to handle complex tasks and make decisions dynamically. It highlights a practical implementation pattern using LangGraph-style routing workflows: a request enters the workflow, the agent checks relevance against the connected vector store, and the system can branch to different retrieval paths. If the needed information is missing from one vector database, the workflow can continue to other conditions or fall back to the LLM’s own capabilities, with additional routing rules controlling what happens next.
Finally, the transcript positions agentic RAG as a scalable approach. As more specialized vector databases are added—such as company policy documents or other domain corpora—each becomes a retriever tool the agent can select from. The result is a more modular system where the agent chooses the best context source per query, rather than relying on a single retrieval store for every question.
Cornell Notes
Traditional RAG takes a user query, retrieves relevant context from one vector database, and then uses the LLM mainly to summarize or generate an answer from that retrieved text. Agentic RAG keeps the retrieval-and-summarize pattern but adds an agent that can route each query to the most appropriate retriever tool (i.e., the right vector database). With multiple stores—such as one for Udemy courses and another for live courses—the agent decides whether the query should go to DB2 or DB3, improving the accuracy of retrieved context. Routing can be implemented with a workflow (e.g., LangGraph-style) that checks relevance and applies conditions or fallbacks when information is missing. This makes the system more scalable as more specialized databases are added.
How does a traditional RAG system handle a user query end to end?
Why does the transcript say the LLM is used only once in traditional RAG?
What changes in agentic RAG when multiple vector databases exist?
How does the agent decide which database to query?
What happens when the requested information is not present in the selected vector database?
Why does the transcript argue that splitting knowledge into specialized vector databases can improve results?
Review Questions
- In traditional RAG, what is the sequence of operations from query to final answer, and where does the LLM fit in?
- In agentic RAG, what problem does routing solve, and how does routing affect retrieval context quality?
- How can a routing workflow handle cases where the selected vector database lacks the needed information?
Key Points
- 1
Traditional RAG retrieves context from a single vector database, then uses the LLM mainly to generate from that context plus a prompt.
- 2
Agentic RAG adds an agent layer that dynamically routes each query to the most relevant retriever tool (vector database).
- 3
Multiple specialized vector databases (e.g., Udemy vs. live courses) can improve answer accuracy by ensuring the right context source is queried.
- 4
Routing decisions are made by an LLM-powered agent integrated with retriever tools, reducing the need for manual hard-coding.
- 5
Workflow-based routing (e.g., LangGraph-style) can include relevance checks, conditional branches, and fallbacks when information is missing.
- 6
As more domains are added (like company policy), each new vector database can be registered as a retriever tool the agent can choose from.