Advanced RAG: How Corrective RAG (CRAG) Solves Traditional RAG Problems

TL;DR

Traditional RAG can fail catastrophically when semantic retrieval returns irrelevant chunks, because the LLM is still prompted to answer using that context.

Briefing Cornell Notes

Briefing

Corrective RAG (CRAG) is presented as a fix for a core weakness in traditional RAG: it blindly trusts retrieved documents, so when retrieval returns irrelevant (or only partially relevant) context, the LLM can still generate a confident but wrong answer—sometimes with dangerous downstream consequences in business workflows. The transcript walks through a conventional RAG pipeline (query → embedding → vector search retrieval → prompt with retrieved context → LLM generation) and pinpoints the failure mode: semantic search can return “something” even when the retrieved documents don’t actually contain the needed knowledge. An example asks “What is an LLM?” while the vector store holds only machine-learning books; the system still retrieves content and forces the LLM to answer from it, creating a high risk of hallucination.

CRAG’s central move is to insert an evaluation step between retrieval and generation. After retrieval, a “retrieval evaluator” model checks whether the retrieved documents are useful for answering the query. The evaluator routes the request into three cases: (1) retrieved documents are relevant → proceed with normal RAG-style generation using those documents; (2) retrieved documents are not relevant → skip generation from the bad context and instead use external knowledge (a web search tool) to fetch new documents, then generate from that; (3) retrieved documents are partially useful → refine the good parts and supplement missing information with web results, then merge both sources for generation. The transcript emphasizes that CRAG does not assume retrieval is correct; it treats retrieval quality as uncertain and actively manages it.

To make the idea concrete, the walkthrough builds CRAG step-by-step on top of a working RAG “chatbot” using a LangGraph-style state machine. First, it adds “knowledge refinement” to improve answer quality even when retrieved chunks contain extra, off-topic text due to chunking. This refinement decomposes documents into sentence-level strips, filters strips by relevance using an LLM-based yes/no criterion, recombines only the kept strips, and then feeds the refined context to the generator. The transcript demonstrates that when the query is covered by the books (e.g., bias-variance tradeoff), retrieval and refinement produce accurate, on-topic answers; when the query is not covered (e.g., “What is a transformer in deep learning”), the system can still produce an answer—but the retrieved chunks lack transformer-specific coverage, so the output risks being grounded in the LLM’s parametric knowledge.

Next, the build adds “retrieval evaluation” using thresholding. Each retrieved document gets a relevance score (0–1) and a reason; documents above a lower threshold are treated as “good,” and routing decisions depend on whether at least one good document exists, whether all documents are below threshold, or whether scores are mixed (ambiguous). In the implementation shown, generation proceeds only with “good” documents; low-scoring chunks are excluded.

Then CRAG expands to external knowledge: when retrieval is incorrect, the system rewrites the query for search (query rewriting) and uses a web search tool (Tavily) to fetch documents, refines those web documents, and generates from the refined external context. Finally, the ambiguous case is handled by merging internal “good” documents with refined web documents and running the same refinement pipeline over the combined context.

The overall takeaway is an architecture-level reliability upgrade: CRAG turns RAG from a single-pass retrieval-and-trust system into a routed, quality-aware pipeline that can refuse bad internal context, supplement with web knowledge, and refine mixed evidence before generation—reducing the likelihood of hallucinated answers when the vector store misses the needed facts.

Cornell Notes

CRAG (Corrective RAG) addresses a key RAG failure: retrieved documents may be irrelevant, yet the LLM is still prompted to answer using that context, leading to confident wrong outputs. CRAG inserts a retrieval-evaluation step after vector search and routes the request into three cases: relevant → refine and generate from internal docs; irrelevant → rewrite the query, web-search, refine web docs, then generate; ambiguous/partially relevant → refine and merge internal “good” docs with refined web docs before generation. The transcript also shows how knowledge refinement works at the chunk level by decomposing documents into sentence strips, filtering strips by relevance, and recombining only kept content. The result is a reliability-focused RAG pipeline that treats retrieval quality as uncertain rather than guaranteed.

What exactly goes wrong in traditional RAG when retrieval is imperfect?

Traditional RAG retrieves top-k chunks via semantic similarity, then prompts the LLM to answer using only that retrieved context. If the retrieved chunks don’t actually contain the needed knowledge, the LLM still generates an answer—effectively forced to ground itself in irrelevant text. The transcript’s example: asking “What is an LLM?” while the vector store contains only machine-learning books. Retrieval still returns something, so the LLM produces an answer from mismatched context, creating a high risk of hallucination and downstream errors (e.g., a business user relying on a wrong policy answer).

How does CRAG decide whether retrieved documents are safe to use?

After retrieval, CRAG runs a retrieval evaluator model that scores each retrieved document for relevance to the query and provides a reason. The system uses thresholds to classify retrieval quality into three routing outcomes: correct (at least one document exceeds the upper threshold), incorrect (no documents exceed the lower threshold), or ambiguous (mixed scores). Generation proceeds only when internal evidence is deemed sufficient; otherwise, CRAG switches to external knowledge via web search.

Why add “knowledge refinement” even when retrieval returns relevant documents?

Chunking can mix topics: a single retrieved chunk may contain both relevant and irrelevant sentences because chunk boundaries don’t align with semantic topic boundaries. Knowledge refinement decomposes retrieved documents into sentence-level strips, filters strips using an LLM-based relevance check, and recombines only the kept strips. This improves generation quality by removing off-topic sentences that would otherwise pollute the context.

What changes when CRAG moves from “incorrect retrieval” to using the web?

When retrieval is classified as incorrect, CRAG rewrites the user query for better search performance, then uses a web search tool (Tavily) to fetch multiple web documents. Those web documents are converted into document objects, refined using the same strip-filtering approach, and only then used for generation. The system avoids returning empty answers even when the vector store fails.

How does CRAG handle the ambiguous case differently from correct and incorrect?

In the ambiguous case, internal retrieval is partially useful: some chunks are above threshold while others are not. CRAG refines internal “good” documents and also performs web search, then merges the internal good context with refined web context. The refinement pipeline runs over the combined evidence, producing a single context for generation—rather than choosing only internal or only external knowledge.

Review Questions

In traditional RAG, what mechanism forces the LLM to answer even when retrieved documents are irrelevant, and why does that increase hallucination risk?
Describe the three CRAG routing cases and what data sources (internal docs, web docs, or both) feed into generation for each case.
How does sentence-level strip filtering in knowledge refinement improve context quality compared with using retrieved chunks directly?

Key Points

1
Traditional RAG can fail catastrophically when semantic retrieval returns irrelevant chunks, because the LLM is still prompted to answer using that context.
2
CRAG inserts a retrieval-evaluation step after vector search to score document relevance and route the request into correct, incorrect, or ambiguous paths.
3
Knowledge refinement improves RAG output by decomposing retrieved text into sentence strips, filtering strips by relevance, and recombining only kept content.
4
When retrieval is incorrect, CRAG rewrites the query for search, uses Tavily to fetch external documents, refines them, and then generates an answer from the refined web context.
5
When retrieval is ambiguous, CRAG merges internal “good” documents with refined web documents and runs refinement over the combined evidence before generation.
6
In the implementation shown, generation uses only documents whose evaluation scores exceed the lower threshold; low-scoring chunks are excluded from the context fed to the LLM.

Highlights

CRAG’s reliability upgrade comes from refusing to blindly trust retrieved documents; a retrieval evaluator decides whether internal context is usable before generation starts.

Knowledge refinement combats chunking artifacts by filtering sentence-level strips, preventing off-topic sentences from contaminating the prompt context.

Query rewriting improves web-search outcomes by converting vague user questions into search-engine-friendly queries (e.g., adding recency constraints).

In the ambiguous case, CRAG merges internal good evidence with refined web evidence, then generates from the merged refined context rather than choosing one source exclusively.

Topics

Corrective RAG
Retrieval Evaluation
Knowledge Refinement
Query Rewriting
Web-Augmented RAG

Mentioned

Nitesh

Advanced RAG: How Corrective RAG (CRAG) Solves Traditional RAG Problems | CampusX