Advanced RAG 03 - Hybrid Search BM25 & Ensembles
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Hybrid search combines BM25 keyword retrieval with embedding-based semantic retrieval to improve RAG relevance.
Briefing
Hybrid search in retrieval-augmented generation (RAG) combines two retrieval styles: keyword matching and semantic matching. The core idea is to pair a sparse retriever—built on BM25—with a dense retriever—built on embeddings—so results benefit from both exact term overlap and meaning-based similarity. This matters because many real queries contain both “literal” signals (names, exact phrases, specific terms) and “semantic” signals (concepts that may not share the same wording). Hybrid search aims to reduce the failure modes of using only one approach.
BM25 is the keyword component. It’s not a new algorithm; it has existed since the 1970s and 1980s and remains competitive with more modern deep-learning approaches in many search settings. BM25 works by creating sparse representations from word and n-gram counts, using TF-IDF-style weighting (term frequency and inverse document frequency). A practical advantage is speed: sparse keyword scoring is often much faster to compute than dense embedding similarity.
On the semantic side, the dense retriever uses embeddings. In the transcript’s example, OpenAI embeddings are used to embed documents and then store them in a FAISS vector store for similarity search. When a query like “green fruit” is issued, the dense retriever retrieves documents that match the concept even when the exact keyword doesn’t appear—such as returning fruit-related text and “apples and oranges” even if the query words don’t directly occur.
The transcript demonstrates the difference with a small set of documents containing phrases like “I like apples,” “I like computers by Apple,” and “I love fruit juice.” With BM25, a query such as “apple” tends to favor direct keyword matches, which can pull in the “Apple” computer reference depending on how terms appear in the documents. With the dense retriever, a query like “green fruit” shifts results toward fruit-related meaning rather than literal token overlap.
The key step is combining both retrievers using an ensemble retriever. The ensemble takes the sparse BM25 retriever and the dense embedding retriever, then applies a weighting scheme to re-rank and merge their outputs. In the example, hybrid retrieval for “green fruit” elevates fruit-related documents first, while a query like “Apple phones” pushes “I like computers by Apple” to the top because the semantic signal aligns with the intended meaning.
The takeaway is not that hybrid search always beats pure semantic search, but that it often helps when queries include exact words that appear in text—such as people’s names, product terms, or other specific entities—while still benefiting from semantic matching when wording varies. The example is intentionally simplified, but it’s positioned as a template: build a BM25 sparse retriever, build an embedding-based dense retriever (e.g., with FAISS), then combine them with an ensemble and tune weights to match the use case.
Cornell Notes
Hybrid search merges BM25 keyword retrieval with embedding-based semantic retrieval to improve RAG results. BM25 builds sparse vectors from word and n-gram counts using TF-IDF-style weighting and is typically fast to compute. Dense retrieval embeds documents (e.g., with OpenAI embeddings) and uses a vector store like FAISS to find semantically similar text. An ensemble retriever combines both result sets and re-ranks them using weights, so queries with exact terms (names, specific products) and concept-based intent (meaning even without exact wording) both get better coverage. The transcript’s examples show “green fruit” favoring fruit-related documents and “Apple phones” favoring the “Apple” computer reference over unrelated “fruit” uses.
What is hybrid search in RAG, and why combine keyword and vector retrieval?
How does BM25 work at a high level, and what makes it fast?
How does the dense retriever behave differently from BM25 in the examples?
What does an ensemble retriever do when combining BM25 and dense retrieval?
When is hybrid search most likely to help compared with pure semantic search?
Review Questions
- In what ways do BM25 and embedding-based retrieval differ in what they treat as “relevance”?
- How does weighting in an ensemble retriever change the final ranking, and what might you tune for a new dataset?
- Give an example query where hybrid search would outperform pure semantic search, and explain why.
Key Points
- 1
Hybrid search combines BM25 keyword retrieval with embedding-based semantic retrieval to improve RAG relevance.
- 2
BM25 uses sparse vectors built from word and n-gram counts with TF-IDF-style weighting and is typically fast to compute.
- 3
Dense retrieval embeds documents (e.g., with OpenAI embeddings) and uses a vector store such as FAISS to find semantically similar text.
- 4
An ensemble retriever merges results from multiple retrievers and re-ranks them using configurable weights.
- 5
Hybrid search tends to help most when queries include exact terms (names, product terms) while still benefiting from semantic matching.
- 6
Toy examples show “green fruit” favoring fruit-related meaning and “Apple phones” favoring the intended “Apple” entity via semantic alignment.