Retrievers in LangChain | Generative AI using LangChain | Video 13

TL;DR

Retrievers are runnable LangChain components that take a user query and return multiple LangChain Document objects from a data source.

Briefing Cornell Notes

Briefing

RAG systems live or die by retrieval quality, and LangChain’s retrievers are the modular “search engines” that pull the most relevant documents from a data source in response to a user query. In this CampusX walkthrough, retrievers are framed as runnable components: they take a query as input and return multiple LangChain Document objects, while internally searching a data store (vector database, API, or other sources). That modularity matters because it lets developers swap retrieval strategies and plug retrievers into larger chains without rewriting the whole pipeline.

The session begins by placing retrievers as the fourth core RAG component after document loading, text splitting, and vector stores. A retriever is defined as a LangChain component that fetches relevant documents from a data source for a given user query. The walkthrough emphasizes that LangChain doesn’t rely on a single retriever type: multiple retrievers exist for different use cases, and all are “runnable,” meaning they can be composed into chains for end-to-end RAG workflows.

From there, retrievers are categorized in two practical ways. First, by the data source they query: examples include a Wikipedia retriever that calls the Wikipedia API and selects relevant articles (using keyword-based matching rather than semantic search), and a vector-store retriever that performs semantic similarity search using vector embeddings. Second, by retrieval strategy: the lecture previews advanced approaches such as MMR (Maximum Marginal Relevance), Multi-Query retrieval, and Contextual Compression.

The code demos start with the Wikipedia retriever. A retriever object is created with parameters like the number of top results (k) and language. Calling the retriever’s invoke method sends the query to the Wikipedia API, then returns a list of Document objects with page content and metadata. A key clarification follows: this is not merely a document loader that bulk-fetches everything; it behaves like a search mechanism that decides which articles to return based on relevance.

Next comes the vector-store retriever using Chroma and OpenAI embeddings. Documents are embedded into dense vectors, stored in a vector database, and retrieved by comparing the query embedding against stored document embeddings. The lecture also addresses why a retriever wrapper can still be useful even when a vector store can run similarity search directly: the retriever becomes a standardized runnable interface that enables swapping in more advanced retrieval strategies later.

Three advanced retrievers are then unpacked conceptually and with examples. MMR tackles redundancy: instead of returning the top-k most similar documents that may repeat the same idea, it selects documents that are both relevant to the query and diverse from each other. Multi-Query retriever handles ambiguous questions by sending the original query to an LLM to generate multiple focused sub-queries, retrieving for each, then merging and deduplicating results—improving coverage when a single query could mean several things. Contextual Compression retriever improves answer quality by trimming retrieved documents: it first retrieves candidate documents with a base retriever, then uses an LLM-based compressor to keep only the parts relevant to the query, discarding unrelated sections to reduce noise and context length.

The takeaway is forward-looking: many retrievers exist because RAG performance often needs iterative upgrades. When a baseline RAG system underperforms, swapping in advanced retrievers—rather than changing the entire architecture—can meaningfully improve relevance, diversity, and user experience.

Cornell Notes

Retrievers are LangChain components that fetch relevant documents from a data source in response to a user query, returning multiple LangChain Document objects. They’re runnable, so they can be plugged into chains and swapped to change retrieval behavior without rebuilding the whole RAG pipeline. The lecture groups retrievers by (1) data source—like Wikipedia API vs vector stores—and (2) retrieval strategy—like MMR, Multi-Query, and Contextual Compression. MMR reduces redundancy by selecting relevant yet diverse documents. Multi-Query improves ambiguous queries by generating multiple sub-queries with an LLM, retrieving for each, then merging results. Contextual Compression trims retrieved documents to keep only query-relevant content, reducing noise and context length.

What exactly does a retriever do inside a RAG pipeline, and what does it return?

A retriever is a LangChain component that takes a user query as input and searches a data source for relevant documents. It returns multiple LangChain Document objects. Internally, it performs the “search engine” step—scanning or querying the underlying store (for example, a vector database or an external API)—and then fetching the most relevant results for that specific query.

How can retrievers be categorized in LangChain?

The lecture uses two categorization axes. First: the data source basis—e.g., a Wikipedia retriever that queries the Wikipedia API, or a vector-store retriever that searches an embedding-based vector database. Second: the retrieval strategy basis—different retrievers use different mechanisms to choose documents (examples previewed include MMR, Multi-Query, and Contextual Compression).

Why does MMR matter when similarity search returns redundant results?

In a normal similarity search, top results can be highly relevant yet repetitive—e.g., multiple documents saying the same thing about glacier melting under climate change. MMR (Maximum Marginal Relevance) aims to reduce redundancy by selecting documents that are not only relevant to the query but also dissimilar from each other. It picks the most relevant first, then chooses the next document that is still relevant while being maximally different from what was already selected.

How does Multi-Query retriever handle ambiguous user questions?

When a query is broad or unclear (e.g., “How can I stay healthy?”), it may map to multiple intents like diet, exercise frequency, or stress management. Multi-Query retriever sends the ambiguous query to an LLM to generate multiple more specific sub-queries, runs retrieval for each sub-query, then merges and deduplicates the results. This improves coverage by aligning retrieval with the different possible meanings.

What problem does Contextual Compression retriever solve, and how?

Some retrieved documents contain mixed topics, so the answer may include irrelevant content. Contextual Compression retriever first uses a base retriever to fetch candidate documents, then applies an LLM-based compressor (via an “LLM chain extractor” style component) to trim each document to only the parts relevant to the query. The result is shorter, cleaner outputs—often single-sentence or small snippets—while discarding unrelated sections.

Review Questions

In what ways can retrievers be swapped to improve a RAG system without changing the rest of the pipeline?
Describe how MMR differs from standard similarity search in terms of relevance and diversity.
Explain the end-to-end flow of Multi-Query retrieval from an ambiguous user query to merged final results.

Key Points

1
Retrievers are runnable LangChain components that take a user query and return multiple LangChain Document objects from a data source.
2
LangChain retrievers can be categorized by data source (e.g., Wikipedia API vs vector store) and by retrieval strategy (e.g., MMR, Multi-Query, Contextual Compression).
3
Wikipedia retriever behavior is search-like: it queries the Wikipedia API and selects relevant articles rather than loading everything.
4
Vector-store retrievers use embeddings for semantic similarity search by converting both documents and queries into dense vectors.
5
MMR reduces redundancy by selecting documents that are relevant to the query while also being diverse from each other.
6
Multi-Query retriever improves ambiguous queries by using an LLM to generate multiple sub-queries, retrieving for each, then merging and deduplicating.
7
Contextual Compression retriever improves answer quality by trimming retrieved documents to only query-relevant content, reducing noise and context length.

Highlights

A retriever is essentially the RAG “search engine”: query in, multiple Document objects out, with internal logic that decides what’s relevant.

MMR targets a common failure mode of similarity search—repeated ideas—by enforcing diversity among retrieved results.

Multi-Query retrieval turns one vague question into several focused sub-queries using an LLM, then merges the retrieved evidence.

Contextual Compression can return short, query-specific snippets by trimming mixed-topic documents after retrieval.

Topics

Retrievers in LangChain
RAG Components
Wikipedia Retriever
Vector Store Retrieval
MMR and Multi-Query
Contextual Compression

Mentioned

Nitesh

Retrievers in LangChain | Generative AI using LangChain | Video 13 | CampusX