Advanced RAG: How Corrective RAG (CRAG) Solves Traditional RAG Problems | CampusX
Based on CampusX's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Traditional RAG can fail catastrophically when semantic retrieval returns irrelevant chunks, because the LLM is still prompted to answer using that context.
Briefing
Corrective RAG (CRAG) is presented as a fix for a core weakness in traditional RAG: it blindly trusts retrieved documents, so when retrieval returns irrelevant (or only partially relevant) context, the LLM can still generate a confident but wrong answer—sometimes with dangerous downstream consequences in business workflows. The transcript walks through a conventional RAG pipeline (query → embedding → vector search retrieval → prompt with retrieved context → LLM generation) and pinpoints the failure mode: semantic search can return “something” even when the retrieved documents don’t actually contain the needed knowledge. An example asks “What is an LLM?” while the vector store holds only machine-learning books; the system still retrieves content and forces the LLM to answer from it, creating a high risk of hallucination.
CRAG’s central move is to insert an evaluation step between retrieval and generation. After retrieval, a “retrieval evaluator” model checks whether the retrieved documents are useful for answering the query. The evaluator routes the request into three cases: (1) retrieved documents are relevant → proceed with normal RAG-style generation using those documents; (2) retrieved documents are not relevant → skip generation from the bad context and instead use external knowledge (a web search tool) to fetch new documents, then generate from that; (3) retrieved documents are partially useful → refine the good parts and supplement missing information with web results, then merge both sources for generation. The transcript emphasizes that CRAG does not assume retrieval is correct; it treats retrieval quality as uncertain and actively manages it.
To make the idea concrete, the walkthrough builds CRAG step-by-step on top of a working RAG “chatbot” using a LangGraph-style state machine. First, it adds “knowledge refinement” to improve answer quality even when retrieved chunks contain extra, off-topic text due to chunking. This refinement decomposes documents into sentence-level strips, filters strips by relevance using an LLM-based yes/no criterion, recombines only the kept strips, and then feeds the refined context to the generator. The transcript demonstrates that when the query is covered by the books (e.g., bias-variance tradeoff), retrieval and refinement produce accurate, on-topic answers; when the query is not covered (e.g., “What is a transformer in deep learning”), the system can still produce an answer—but the retrieved chunks lack transformer-specific coverage, so the output risks being grounded in the LLM’s parametric knowledge.
Next, the build adds “retrieval evaluation” using thresholding. Each retrieved document gets a relevance score (0–1) and a reason; documents above a lower threshold are treated as “good,” and routing decisions depend on whether at least one good document exists, whether all documents are below threshold, or whether scores are mixed (ambiguous). In the implementation shown, generation proceeds only with “good” documents; low-scoring chunks are excluded.
Then CRAG expands to external knowledge: when retrieval is incorrect, the system rewrites the query for search (query rewriting) and uses a web search tool (Tavily) to fetch documents, refines those web documents, and generates from the refined external context. Finally, the ambiguous case is handled by merging internal “good” documents with refined web documents and running the same refinement pipeline over the combined context.
The overall takeaway is an architecture-level reliability upgrade: CRAG turns RAG from a single-pass retrieval-and-trust system into a routed, quality-aware pipeline that can refuse bad internal context, supplement with web knowledge, and refine mixed evidence before generation—reducing the likelihood of hallucinated answers when the vector store misses the needed facts.
Cornell Notes
CRAG (Corrective RAG) addresses a key RAG failure: retrieved documents may be irrelevant, yet the LLM is still prompted to answer using that context, leading to confident wrong outputs. CRAG inserts a retrieval-evaluation step after vector search and routes the request into three cases: relevant → refine and generate from internal docs; irrelevant → rewrite the query, web-search, refine web docs, then generate; ambiguous/partially relevant → refine and merge internal “good” docs with refined web docs before generation. The transcript also shows how knowledge refinement works at the chunk level by decomposing documents into sentence strips, filtering strips by relevance, and recombining only kept content. The result is a reliability-focused RAG pipeline that treats retrieval quality as uncertain rather than guaranteed.
What exactly goes wrong in traditional RAG when retrieval is imperfect?
How does CRAG decide whether retrieved documents are safe to use?
Why add “knowledge refinement” even when retrieval returns relevant documents?
What changes when CRAG moves from “incorrect retrieval” to using the web?
How does CRAG handle the ambiguous case differently from correct and incorrect?
Review Questions
- In traditional RAG, what mechanism forces the LLM to answer even when retrieved documents are irrelevant, and why does that increase hallucination risk?
- Describe the three CRAG routing cases and what data sources (internal docs, web docs, or both) feed into generation for each case.
- How does sentence-level strip filtering in knowledge refinement improve context quality compared with using retrieved chunks directly?
Key Points
- 1
Traditional RAG can fail catastrophically when semantic retrieval returns irrelevant chunks, because the LLM is still prompted to answer using that context.
- 2
CRAG inserts a retrieval-evaluation step after vector search to score document relevance and route the request into correct, incorrect, or ambiguous paths.
- 3
Knowledge refinement improves RAG output by decomposing retrieved text into sentence strips, filtering strips by relevance, and recombining only kept content.
- 4
When retrieval is incorrect, CRAG rewrites the query for search, uses Tavily to fetch external documents, refines them, and then generates an answer from the refined web context.
- 5
When retrieval is ambiguous, CRAG merges internal “good” documents with refined web documents and runs refinement over the combined evidence before generation.
- 6
In the implementation shown, generation uses only documents whose evaluation scores exceed the lower threshold; low-scoring chunks are excluded from the context fed to the LLM.