Advanced RAG 05 - HyDE - Hypothetical Document Embeddings

TL;DR

HyDE improves dense retrieval by embedding an LLM-generated hypothetical answer rather than the original query.

Briefing Cornell Notes

Briefing

HyDE (Hypothetical Document Embeddings) improves retrieval in RAG by using a large language model to draft a “hypothetical answer,” embedding that generated text, and then running similarity search against document chunks using the embedding of the answer—not the embedding of the original query. The practical payoff is straightforward: when user questions are vague, missing key nouns, or otherwise hard to match semantically, the LLM can supply the missing topical anchors (entities and concepts) so the vector search lands on the right material more reliably.

The core mechanism works like this. A user query goes into an LLM, which produces a short passage that would answer the question. That passage is never meant for the user; it exists to create a better embedding target. Instead of comparing “query embedding → chunk embeddings,” HyDE compares “hypothetical answer embedding → chunk embeddings.” Even when the hypothetical answer is imperfect, it often still contains the right terms and relationships—enough for embedding similarity to retrieve relevant chunks.

A concrete example centers on a query like “What is McDonald’s best items?” The query itself doesn’t explicitly mention food, burgers, or specific menu items. Directly embedding that query and searching a vector store can underperform because the nearest chunks may not share the missing nouns. HyDE changes the workflow: the LLM generates an answer mentioning fast food and likely bestsellers such as the Big Mac and other menu items. The resulting embedding is therefore aligned with the vocabulary present in the documents, increasing the chance that chunks about those items are retrieved.

HyDE also supports generating multiple hypothetical answers. Rather than relying on a single draft, the system can produce several candidate passages, embed each one, and combine them—such as by averaging embeddings—to create a more robust retrieval signal. This helps when the LLM’s first guess is incomplete or when the question’s intent is underspecified; the combined representation tends to preserve key concepts that recur across generations.

Prompting matters. The transcript highlights that customizing the LLM prompt can steer the hypothetical text toward the retrieval goal. For instance, a prompt can instruct the model to recommend a single food item (useful when the question expects one best-selling product), producing a shorter, more targeted hypothetical document. That targeted output then yields an embedding that better matches the intended document sections.

The implementation uses OpenAI for the LLM and BGE embeddings for vectorization, though the approach is model-agnostic: any embedding system can be swapped in, including local or quantized embeddings. The transcript includes an example where HyDE can still work even if the hypothetical answer is wrong in details, as long as it mentions the right concepts. It also flags a key limitation: if the topic is entirely unfamiliar to the LLM, the hypothetical text may hallucinate unrelated content, harming retrieval. In those cases, HyDE should be used cautiously.

Overall, HyDE is presented as a simple but powerful upgrade to dense retrieval inside RAG—especially for short, noun-light, or ambiguous queries—by converting “hard-to-search questions” into “search-friendly hypothetical answers” before embedding and retrieval.

Cornell Notes

HyDE (Hypothetical Document Embeddings) boosts RAG retrieval by embedding a hypothetical answer generated by an LLM, then searching document chunks using that answer embedding. This helps when user queries are vague or omit key nouns—because the LLM can supply entities and topical terms that make vector similarity search more effective. The method can generate one hypothetical passage or multiple passages; multiple embeddings can be combined (e.g., averaged) to strengthen the retrieval signal. HyDE works best when the LLM has enough knowledge to produce a plausible, concept-aligned answer; if the topic is too unfamiliar, hallucinated content can mislead retrieval. Prompt customization is central: it can steer the hypothetical text toward the exact form and level of specificity needed for the downstream search.

How does HyDE change the standard dense retrieval workflow in RAG?

Instead of embedding the user query and comparing it directly to chunk embeddings, HyDE sends the query to a large language model to generate a hypothetical answer passage. That generated passage is embedded, and the system performs similarity search between the hypothetical-answer embedding and embeddings of document chunks. The hypothetical text is not intended for the end user; it exists to create a better embedding target for retrieval.

Why can HyDE outperform “query embedding → chunk embedding” similarity search?

Many user questions are short, vague, or missing key nouns. For example, “What is McDonald’s best items?” doesn’t explicitly mention food, burgers, or specific menu items. A query embedding may therefore match poorly. HyDE prompts the LLM to produce an answer that includes relevant anchors like fast food and likely bestsellers (e.g., the Big Mac). Even if the hypothetical answer isn’t perfectly accurate, the presence of the right concepts improves embedding similarity with the correct chunks.

What role do multiple hypothetical generations play?

HyDE can generate several hypothetical answers for the same question. Each hypothetical answer is embedded, and the embeddings can be combined—such as by averaging—to form a more stable retrieval representation. This reduces reliance on a single draft and helps preserve key terms that appear across different generations.

How does prompt design affect HyDE’s retrieval quality?

Prompting determines what the hypothetical passage contains. A generic prompt may produce a longer answer listing many menu items, which can help retrieval by covering more related vocabulary. A more constrained prompt can force the model to output a single item (e.g., “recommend a single food item”), producing a tighter embedding target. The transcript emphasizes that guiding the LLM toward the expected topic and format can materially change what gets retrieved.

When should HyDE be avoided or used cautiously?

HyDE depends on the LLM producing conceptually aligned hypothetical text. If the topic is outside the LLM’s knowledge, the model may hallucinate unrelated content. In that scenario, the hypothetical-answer embedding can steer similarity search toward the wrong chunks, degrading retrieval.

Review Questions

In HyDE, what gets embedded for retrieval—the original query or the LLM-generated hypothetical passage—and why does that matter for noun-light questions?
How would you adapt HyDE prompting if the user question expects a single specific item rather than a broad list?
What failure mode occurs when the LLM lacks knowledge about the topic, and how would that show up in retrieval results?

Key Points

1
HyDE improves dense retrieval by embedding an LLM-generated hypothetical answer rather than the original query.
2
The hypothetical passage is used only to create an embedding target; it is not meant to be shown to the user.
3
HyDE is especially useful for short or vague queries that omit key nouns and entities needed for effective vector similarity search.
4
Generating multiple hypothetical answers and combining their embeddings (e.g., averaging) can make retrieval more robust.
5
Prompt customization steers what concepts appear in the hypothetical text, which directly affects what chunks are retrieved.
6
HyDE can fail when the LLM hallucinates unrelated content due to insufficient knowledge of the topic.
7
The approach is flexible: OpenAI can be used for the LLM and BGE embeddings for vectorization, but other embedding systems (including local/quantized) can be substituted.

Highlights

HyDE turns “hard-to-embed questions” into “search-friendly hypothetical answers” by embedding the LLM’s draft instead of the user’s wording.

Even an imperfect hypothetical answer can help if it includes the right entities and concepts that exist in the target documents.

Multiple generations can be embedded and combined to reduce sensitivity to any single LLM guess.

Prompting can constrain HyDE output (e.g., “single food item”), producing embeddings that better match narrowly scoped document sections.

HyDE should be used cautiously for topics the LLM barely knows, since hallucinated hypothetical text can misdirect retrieval.

Topics

HyDE
Hypothetical Document Embeddings
RAG Retrieval
Dense Vector Search
Prompt Engineering

Mentioned

Sam Witteveen
RAG
LLM
BGE