Advanced RAG 01 - Self Querying Retrieval

Q: How does the system handle result limits and messy user input?

When users ask for a small number of results (e.g., “two wines”), the LLM can generate a limit parameter so retrieval returns only that many matches. It can also normalize input issues like capitalization differences and minor misspellings, still producing valid metadata filters such as country == Australia or country == New Zealand.

TL;DR

Use semantic search only for fields where meaning matters (free-text descriptions), not for exact categorical or numeric attributes like year or artist name.

Briefing Cornell Notes

Briefing

RAG systems break down when everything gets shoved through semantic search—even fields that should behave like exact lookups. The core fix is “self-querying retrieval”: insert a step where a large language model rewrites a user’s natural-language question into (1) a semantic target for vector search and (2) structured metadata filters that can be applied deterministically.

Instead of treating every query as meaning-only, the approach draws a clear line between what deserves semantic matching and what should be handled like database constraints. If someone asks for a movie from a specific year, the year should filter results via metadata rather than being embedded and compared in a vector store. The same principle applies to music: an artist name is better handled as a metadata equality lookup, while semantic search is reserved for the parts where meaning matters (e.g., descriptions, attributes, or free-text flavor notes).

LangChain’s self querying retriever implements this by placing an LLM between the user query and retrieval. The user types something like “What are some red wines?” The LLM converts that into a structured query: it identifies which metadata field to filter (here, color) and what comparison to apply (e.g., color == red). When the question is semantic—such as “wines with fruity notes”—the LLM selects the description field as the semantic target and uses vector similarity to find matching entries. The returned results include both the matching documents and their metadata, making it easy to verify that the system is filtering and searching for the right reasons.

The example dataset is a wine catalog where each record includes metadata fields such as name, year (integer), rating (integer, with a note that floats work for fractional scores), grape, color, country, and a descriptive text used for semantic search. The retriever is configured with metadata schema information so the LLM knows which fields are strings, integers, or other types, and which field contains the free-text description.

Once wired up, the system supports mixed queries that combine semantic and structured constraints. Queries like “fruity notes and rating above 97” trigger both a semantic match on the description and a numeric comparison on the rating field. Asking for “wines from Italy” can produce a purely structured filter with no semantic component. More complex requests—such as “earthy wines between 2015 and 2020”—generate composite conditions using logical operators (e.g., AND) and range comparisons on the year field.

The retriever also handles practical controls like result limits. If a user asks for “two wines,” the LLM can generate a limit parameter so the system returns only a small subset. It can even correct messy inputs: capitalization differences and minor misspellings still lead to valid metadata filters (e.g., mapping “Australia or New Zealand” into country equality constraints).

Overall, self-querying retrieval turns RAG from “semantic search everywhere” into a hybrid retrieval strategy: semantic search where meaning is needed, metadata filtering where structure exists, and an LLM-generated bridge that translates natural language into both.

Cornell Notes

Self-querying retrieval fixes a common RAG failure mode: using vector semantic search for fields that should be treated as exact database attributes. In LangChain, an LLM rewrites a user’s question into structured metadata filters (e.g., color == red, country == Italy, rating > 97, year between 2015 and 2020) plus an optional semantic target (typically a description field) for vector search. This hybrid approach lets queries combine meaning-based matching (“fruity notes”) with deterministic constraints (“rating above 97”). It also supports operational needs like limiting results and tolerating capitalization or minor spelling issues. The result is more accurate retrieval than “semantic search only,” especially for numeric and categorical fields.

Why is semantic search a poor default for fields like year or artist name?

Fields such as year, artist name, or other categorical/numeric attributes behave like exact constraints. Using vector similarity for “year = 2012” can return semantically related but wrong years because embeddings capture meaning rather than strict equality. The better pattern is to filter by metadata (e.g., year == 2012) and reserve semantic search for free-text descriptions where meaning matters.

How does self-querying retrieval split a natural-language query into semantic vs. structured parts?

An LLM sits between the user query and retrieval. It reformats the question into (1) a semantic query that targets a specific text field for vector search and (2) metadata filter expressions that apply to typed fields. For example, “What are some red wines?” becomes a structured filter on the color field (color == red) with no semantic component. “Wines with fruity notes” selects the description field for semantic matching.

What metadata schema details does LangChain need for this to work well?

The retriever must be told what metadata fields exist and their types. In the wine example, fields include name (string/list of strings), year (integer), rating (integer; float if using fractional scores), country (string), color (string), grape (string/list), and a description field used for semantic search. With this schema, the LLM can generate correct operators like equality for strings and range comparisons for numbers.

What kinds of queries can combine semantic matching with metadata filtering?

Mixed queries work because the LLM can generate both components at once. Examples include “fruity notes and rating above 97,” which uses semantic similarity on the description while applying rating > 97. Another example is “earthy wines between 2015 and 2020,” which uses semantic matching for “earthy” and applies a year range (2015 <= year <= 2020) using logical operators like AND.

How does the system handle result limits and messy user input?

When users ask for a small number of results (e.g., “two wines”), the LLM can generate a limit parameter so retrieval returns only that many matches. It can also normalize input issues like capitalization differences and minor misspellings, still producing valid metadata filters such as country == Australia or country == New Zealand.

Review Questions

When would you choose metadata filtering over semantic search in a RAG system, and what failure mode does this prevent?
In self-querying retrieval, what information must be provided about metadata fields so the LLM can generate correct filters?
Give an example of a query that requires both semantic matching and structured constraints, and specify which fields would be used for each.

Key Points

1
Use semantic search only for fields where meaning matters (free-text descriptions), not for exact categorical or numeric attributes like year or artist name.
2
Self-querying retrieval inserts an LLM step that converts a user question into both a semantic vector-search target and structured metadata filters.
3
LangChain’s self querying retriever requires a metadata schema (field names and types) so the LLM can generate correct operators like equality and numeric comparisons.
4
Hybrid queries work well: semantic conditions (e.g., “fruity notes”) can be combined with structured constraints (e.g., rating > 97, year ranges).
5
Result limits can be generated by the LLM (e.g., “two wines”) to avoid returning overly large result sets.
6
Good language models can correct minor input issues like capitalization and small misspellings when mapping to metadata values.

Highlights

The biggest practical shift is stopping “semantic search for everything” and instead filtering typed fields deterministically while using vectors for descriptive meaning.

Self-querying retrieval turns questions like “red wines” into explicit metadata filters (color == red) rather than embedding the word “red” and hoping similarity behaves like equality.

Composite constraints—like “earthy wines between 2015 and 2020”—are generated as structured range checks combined with semantic matching on the description field.

Even with messy input (capitalization, misspellings), the LLM can still produce correct metadata equality filters and apply a limit when requested.

Topics

Self-Querying Retrieval
Hybrid RAG
Metadata Filtering
LangChain Retrievers
Vector Store Queries

Mentioned

Sam Witteveen