Why Your RAG Gives Wrong Answers (And 4 Chunking Strategies to Fix It) | LangChain Text Splitters
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Chunking can break RAG coherence even when embeddings, vector search, and LLM context windows are strong.
Briefing
RAG systems often fail for a surprisingly mundane reason: chunking breaks the information the model needs, even when embeddings, vector search, and the LLM’s context window are strong. The core claim is that retrieval quality depends less on “bigger context” and more on how documents are split during ingestion—bad splits can destroy coherence, cut bullet lists in half, and leave the model without the surrounding definitions or headings that make an answer possible.
The discussion starts by challenging a common assumption: modern LLMs with large context windows and powerful embeddings don’t guarantee perfect recall. Even with improved models, relevant text can still be missed or diluted, and it’s impractical to stuff an entire company knowledge base into a single prompt. Chunking is therefore framed as a critical step in the RAG pipeline: raw documents are split into smaller chunks so retrieval can surface the right passages. When chunking goes wrong, the LLM receives insufficient or confusing context—an example shows a naive fixed-size character split (1,024 characters) that slices through a “business days” bullet list, leaving one chunk starting mid-idea and the next chunk continuing the thought without the missing structure. The result is context destruction rather than context creation.
Four chunking strategies are then presented in increasing complexity, each with trade-offs.
First, the recursive character text splitter uses separators in a hierarchy (new lines, paragraphs, sentences, etc.) to build chunks around a target size and overlap. It’s described as a common starting point and “not that bad” if trade-offs are understood, but the example shows it can still split inside structured lists—bullet lists become fragmented across chunks.
Second, the markdown header text splitter leverages document structure. If content is converted to markdown with meaningful headers (H1/H2), chunk boundaries align with human-intended sections. In the example, splitting on H1 and H2 yields more chunks and better preservation of responsibilities and their associated bullet points. The downside is loss of control over chunk size when sections are uneven.
Third, the semantic chunker uses embedding-based similarity to decide where to split. By measuring distances between sentences and applying a threshold, it aims to create chunks that are coherent by meaning rather than by formatting. The trade-off is speed and compute cost: it can be slow and depends heavily on embedding quality and hardware.
Fourth, an LLM-driven chunking approach asks a strong model to choose split points. Candidate split locations are proposed (e.g., after each markdown line that begins with a heading), and the LLM returns which indices to split. The prompt includes document-specific instructions—such as keeping forms, images, and tables in separate chunks with their descriptive text. Using a reasoning model (Quint 3 the 4 billion parameter model) and local execution via OAMA, the method produces cleaner, task-aligned chunks, described as “pretty much the best possible approach” when compute and model strength are available.
A bonus technique adds Anthropic-style contextual retrieval: each chunk is enriched with extra context so the model understands what the chunk represents within the larger document. Using a visual language model (Qwen 2.5 VL) and the first page image, the system generates 2–3 sentences of contextual background and prepends it to the relevant chunk text—especially helpful for standalone sections like a complaint form.
The takeaway is pragmatic: choose chunking based on document type. Start with markdown header splitting when markdown structure exists; use recursive character splitting for quick prototypes; try semantic chunking if you can afford compute; and use LLM-based split-point selection for the highest-quality chunks. Contextual retrieval can further improve answers when chunks need “document-level” grounding.
Cornell Notes
Chunking quality is a major driver of RAG failures, often more than embeddings or the vector database. Poor splits can cut through bullet lists and headings, leaving retrieved chunks without the context needed to answer questions. The transcript walks through four LangChain chunking strategies: recursive character splitting (fast but can fragment structured content), markdown header splitting (better alignment with document sections but less control over chunk size), semantic chunking (meaning-based splits using embedding distances but slower and compute-heavy), and LLM-driven split-point selection (highest-quality chunks by asking a model to choose where to split, including special handling for forms/images/tables). A bonus technique adds contextual retrieval by enriching each chunk with background generated from the document’s first page image using a visual language model, improving performance for standalone sections like forms.
Why does chunking cause “wrong answers” in RAG even when embeddings and the LLM are strong?
What are the practical strengths and weaknesses of recursive character text splitting?
When does markdown header text splitting outperform character-based methods?
How does semantic chunking decide split points, and what’s the cost?
What makes LLM-driven split-point selection more accurate than the earlier strategies?
How does contextual retrieval (Anthropic-style) improve answers for form-like sections?
Review Questions
- If a RAG system retrieves the “right” passage but still answers incorrectly, what chunking failure modes should be checked first (e.g., broken bullet lists, missing headings, or incoherent boundaries)?
- Compare markdown header splitting and semantic chunking: which one depends on document formatting, which one depends on embedding similarity, and how do those dependencies affect chunk size control and runtime?
- Why might LLM-driven split-point selection be worth the extra compute in documents with forms, tables, and images? What prompt instructions would you include to preserve those relationships?
Key Points
- 1
Chunking can break RAG coherence even when embeddings, vector search, and LLM context windows are strong.
- 2
Naive fixed-size character splitting can slice through structured elements like bullet lists, removing the context needed to answer.
- 3
Recursive character text splitting is a common baseline but can still fragment lists and headings despite overlap.
- 4
Markdown header text splitting often works best when documents are converted to markdown with reliable H1/H2 structure.
- 5
Semantic chunking creates meaning-based boundaries using embedding distance thresholds, but it can be slow and compute-intensive.
- 6
LLM-driven split-point selection can produce higher-quality chunks by choosing boundaries with document-specific rules (e.g., keep forms/images/tables with their descriptions).
- 7
Contextual retrieval can further improve results by enriching chunks with document-level background generated from the first page image.