Chunking 101: The Invisible Bottleneck Killing Enterprise AI Projects
Based on AI News & Strategy Daily | Nate B Jones's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Chunking errors can produce confident but wrong answers because RAG retrieves only a few chunks; missing meaning across chunk boundaries can’t be recovered reliably.
Briefing
Chunking—how text is cut into retrieval-ready pieces—is a major, often invisible failure point for enterprise AI systems, and it can directly cause wrong, confident answers and wasted spend. A fintech deal nearly collapsed after an AI chatbot answered an indemnification question incorrectly because the relevant contract language was split mid-sentence across token-sized chunks. Retrieval pulled only the first chunk, leading the system to claim “party A fully indemnifies party B,” even though the contract’s meaning depended on language that was split into the next chunk. The fix wasn’t a smarter model; it was context engineering—chunking the data so the right meaning lands together when the system retrieves a small set of passages.
That same context problem also drives cost and reliability. In retrieval-augmented generation (RAG), the system typically retrieves only three to five chunks per question, chosen by semantic fit. If the true answer is fragmented across those chunks, the model can’t reconstruct missing terms without guessing—fueling hallucinations. Bad chunking also inflates bills: retrieving extra chunks means more tokens loaded into the context window, which can overwhelm the model with irrelevant material and ironically degrade accuracy. The practical takeaway is blunt: chunking is a first line of defense against hallucinations and a lever for reducing model-maker costs by double-digit percentages when done well.
Agentic search doesn’t eliminate the need for chunking; it changes the trade-off. Agentic search uses iterative reasoning—searching, reading, reasoning, and searching again—so it can help with exploratory questions or multi-step tasks like aggregating the total impact of a marketing campaign across channels. But it can be 10x slower and 10x more expensive than well-targeted RAG retrieval. Even when agentic systems are used, they still rely on semantic selection from retrieved units; messy chunking makes that selection worse. The “no free lunch” message is that businesses must wrestle with their own data structure rather than expecting agentic search to bypass embeddings and chunking decisions.
Five chunking principles emerge as the scalable path for production systems. First is context coherence: never split meaning across chunks (e.g., separating “defendant shall pay damages” from the conditions that follow). Respect natural boundaries such as contract sections, code functions/classes, or conversation speaker turns. Second is controlling the three levers—boundaries, size, and overlap—rather than relying on arbitrary token counts. Overlap (often 10–20%) acts as insurance when meaning spans chunk edges. Third, data type dictates strategy: legal text, source code, financial tables, and spreadsheets each require different chunking logic. For code, dependency graphs and “neighborhood chunking” (including called functions) can help; for messy, coupled code, agentic search may be a pragmatic bridge. Excel and financial data are especially tricky because they preserve relationship webs—time windows, categories, formulas, and pivot hierarchies can’t be chunked row-by-row.
Fourth is “Goldilocks” sizing: chunks too small lose context and lead to “I don’t know,” while chunks too large waste tokens and produce unfocused answers. The right approach is to build an evaluation set and test chunking strategies against it. Fifth is overlap again, with overlap tailored to the data’s structure (temporal for time series, categorical for categorical data). The overall argument is that chunking isn’t a minor implementation detail—it’s foundational to retrieval accuracy, hallucination control, and cost efficiency across both RAG and agentic search, especially when corporate data is messy and hard to re-architect later.
Cornell Notes
Chunking determines what information an AI can retrieve and therefore what it can answer correctly. In RAG systems, only a small set of chunks (often 3–5) is retrieved; if key contract or technical meaning is split across chunk boundaries, the system returns confident but wrong answers and may “hallucinate” to fill gaps. Chunking also affects cost: retrieving more chunks loads more tokens into the context window, raising spend and sometimes reducing accuracy by adding irrelevant context. Agentic search can help for exploratory or multi-step tasks, but it still depends on semantic retrieval units and is often far slower and more expensive than well-chunked RAG. Effective chunking follows five principles: preserve context coherence, tune boundaries/size/overlap, adapt to data type (contracts, code, spreadsheets), size for Goldilocks outcomes using evals, and use overlap as insurance.
Why did the fintech chatbot give a wrong indemnification answer even though the model sounded confident?
How does chunking influence both hallucinations and cost in RAG?
What’s the practical difference between RAG and agentic search, and why doesn’t agentic search remove chunking?
What does “context coherence” require when chunking contracts, code, or conversations?
How should chunking strategy change across data types like legal text, source code, and spreadsheets?
How do “size for Goldilocks outcomes” and overlap work together in practice?
Review Questions
- What failure mode occurs when a contract sentence is split across chunks, and how does retrieval behavior (e.g., 3–5 chunks) contribute to it?
- Which of the three chunking levers (boundaries, size, overlap) most directly affects retrieval accuracy, and why?
- How would you design an evaluation set to compare multiple chunking strategies for the same RAG pipeline?
Key Points
- 1
Chunking errors can produce confident but wrong answers because RAG retrieves only a few chunks; missing meaning across chunk boundaries can’t be recovered reliably.
- 2
Bad chunking increases hallucinations and costs by forcing retrieval of extra chunks and injecting irrelevant context into the model’s context window.
- 3
Agentic search can help for exploratory and multi-step tasks, but it still depends on semantic retrieval units and is often far slower and more expensive than well-chunked RAG.
- 4
Preserve context coherence by cutting along natural semantic boundaries (contract sections, code functions/classes, conversation turns) and avoiding splits that separate claims from conditions.
- 5
Tune chunking using boundaries, size, and overlap; overlap (often 10–20%) acts as insurance when meaning spans chunk edges.
- 6
Use data-type-specific chunking strategies: legal text, dependency-aware code chunking, and relationship-preserving spreadsheet/financial chunking require different approaches.
- 7
Find the right chunk size with evals (“Goldilocks outcomes”) rather than arbitrary token thresholds, and validate strategies against a shared question set.