Advanced RAG 03 - Hybrid Search BM25 & Ensembles

TL;DR

Hybrid search combines BM25 keyword retrieval with embedding-based semantic retrieval to improve RAG relevance.

Briefing Cornell Notes

Briefing

Hybrid search in retrieval-augmented generation (RAG) combines two retrieval styles: keyword matching and semantic matching. The core idea is to pair a sparse retriever—built on BM25—with a dense retriever—built on embeddings—so results benefit from both exact term overlap and meaning-based similarity. This matters because many real queries contain both “literal” signals (names, exact phrases, specific terms) and “semantic” signals (concepts that may not share the same wording). Hybrid search aims to reduce the failure modes of using only one approach.

BM25 is the keyword component. It’s not a new algorithm; it has existed since the 1970s and 1980s and remains competitive with more modern deep-learning approaches in many search settings. BM25 works by creating sparse representations from word and n-gram counts, using TF-IDF-style weighting (term frequency and inverse document frequency). A practical advantage is speed: sparse keyword scoring is often much faster to compute than dense embedding similarity.

On the semantic side, the dense retriever uses embeddings. In the transcript’s example, OpenAI embeddings are used to embed documents and then store them in a FAISS vector store for similarity search. When a query like “green fruit” is issued, the dense retriever retrieves documents that match the concept even when the exact keyword doesn’t appear—such as returning fruit-related text and “apples and oranges” even if the query words don’t directly occur.

The transcript demonstrates the difference with a small set of documents containing phrases like “I like apples,” “I like computers by Apple,” and “I love fruit juice.” With BM25, a query such as “apple” tends to favor direct keyword matches, which can pull in the “Apple” computer reference depending on how terms appear in the documents. With the dense retriever, a query like “green fruit” shifts results toward fruit-related meaning rather than literal token overlap.

The key step is combining both retrievers using an ensemble retriever. The ensemble takes the sparse BM25 retriever and the dense embedding retriever, then applies a weighting scheme to re-rank and merge their outputs. In the example, hybrid retrieval for “green fruit” elevates fruit-related documents first, while a query like “Apple phones” pushes “I like computers by Apple” to the top because the semantic signal aligns with the intended meaning.

The takeaway is not that hybrid search always beats pure semantic search, but that it often helps when queries include exact words that appear in text—such as people’s names, product terms, or other specific entities—while still benefiting from semantic matching when wording varies. The example is intentionally simplified, but it’s positioned as a template: build a BM25 sparse retriever, build an embedding-based dense retriever (e.g., with FAISS), then combine them with an ensemble and tune weights to match the use case.

Cornell Notes

Hybrid search merges BM25 keyword retrieval with embedding-based semantic retrieval to improve RAG results. BM25 builds sparse vectors from word and n-gram counts using TF-IDF-style weighting and is typically fast to compute. Dense retrieval embeds documents (e.g., with OpenAI embeddings) and uses a vector store like FAISS to find semantically similar text. An ensemble retriever combines both result sets and re-ranks them using weights, so queries with exact terms (names, specific products) and concept-based intent (meaning even without exact wording) both get better coverage. The transcript’s examples show “green fruit” favoring fruit-related documents and “Apple phones” favoring the “Apple” computer reference over unrelated “fruit” uses.

What is hybrid search in RAG, and why combine keyword and vector retrieval?

Hybrid search combines sparse keyword retrieval with dense semantic retrieval. The sparse side uses BM25 to score documents by token overlap (words and n-grams) with TF-IDF-style weighting, while the dense side uses embeddings to score documents by semantic similarity. Combining them helps when queries contain both exact-match signals (e.g., names, specific terms) and meaning-based signals (concepts that may not share the same wording).

How does BM25 work at a high level, and what makes it fast?

BM25 creates sparse representations by counting words and n-grams and applying TF-IDF-like weighting using term frequency and inverse document frequency. Because it relies on counting and weighting rather than dense vector similarity, it’s often quicker to compute than embedding-based retrieval.

How does the dense retriever behave differently from BM25 in the examples?

In the transcript’s toy documents, BM25 tends to retrieve based on direct keyword matches (e.g., the token “apple” leading to documents containing “Apple”). The dense retriever, using embeddings and FAISS, retrieves by semantic meaning—so a query like “green fruit” pulls in fruit-related documents even when the exact query words aren’t present.

What does an ensemble retriever do when combining BM25 and dense retrieval?

An ensemble retriever takes multiple retrievers (commonly one sparse and one dense) and merges their outputs. It uses a weighting system to re-rank results based on the relative influence of each retriever, producing a combined ranking that reflects both keyword overlap and semantic similarity.

When is hybrid search most likely to help compared with pure semantic search?

Hybrid search is especially useful when users know exact words likely to appear in the text—such as people’s names, product names, or specific entities. In those cases, BM25’s literal matching complements embeddings’ ability to capture meaning when wording varies.

Review Questions

In what ways do BM25 and embedding-based retrieval differ in what they treat as “relevance”?
How does weighting in an ensemble retriever change the final ranking, and what might you tune for a new dataset?
Give an example query where hybrid search would outperform pure semantic search, and explain why.

Key Points

1
Hybrid search combines BM25 keyword retrieval with embedding-based semantic retrieval to improve RAG relevance.
2
BM25 uses sparse vectors built from word and n-gram counts with TF-IDF-style weighting and is typically fast to compute.
3
Dense retrieval embeds documents (e.g., with OpenAI embeddings) and uses a vector store such as FAISS to find semantically similar text.
4
An ensemble retriever merges results from multiple retrievers and re-ranks them using configurable weights.
5
Hybrid search tends to help most when queries include exact terms (names, product terms) while still benefiting from semantic matching.
6
Toy examples show “green fruit” favoring fruit-related meaning and “Apple phones” favoring the intended “Apple” entity via semantic alignment.

Highlights

BM25’s sparse, TF-IDF-style scoring can be much quicker than dense embedding similarity while still delivering strong keyword relevance.

Dense retrieval can return conceptually related documents even when the query tokens don’t directly appear in the text.

Ensemble re-ranking with weights lets hybrid search balance literal keyword matches against semantic similarity, changing which documents land at the top.

Hybrid search is most valuable when user intent mixes exact entities with meaning-based interpretation.

Topics

Hybrid Search
BM25 Retrieval
Ensemble Retrievers
Dense Embeddings
FAISS Vector Store

Mentioned

Sam Witteveen
BM25
TFIDF
RAG