Advanced RAG 01 - Self Querying Retrieval
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use semantic search only for fields where meaning matters (free-text descriptions), not for exact categorical or numeric attributes like year or artist name.
Briefing
RAG systems break down when everything gets shoved through semantic search—even fields that should behave like exact lookups. The core fix is “self-querying retrieval”: insert a step where a large language model rewrites a user’s natural-language question into (1) a semantic target for vector search and (2) structured metadata filters that can be applied deterministically.
Instead of treating every query as meaning-only, the approach draws a clear line between what deserves semantic matching and what should be handled like database constraints. If someone asks for a movie from a specific year, the year should filter results via metadata rather than being embedded and compared in a vector store. The same principle applies to music: an artist name is better handled as a metadata equality lookup, while semantic search is reserved for the parts where meaning matters (e.g., descriptions, attributes, or free-text flavor notes).
LangChain’s self querying retriever implements this by placing an LLM between the user query and retrieval. The user types something like “What are some red wines?” The LLM converts that into a structured query: it identifies which metadata field to filter (here, color) and what comparison to apply (e.g., color == red). When the question is semantic—such as “wines with fruity notes”—the LLM selects the description field as the semantic target and uses vector similarity to find matching entries. The returned results include both the matching documents and their metadata, making it easy to verify that the system is filtering and searching for the right reasons.
The example dataset is a wine catalog where each record includes metadata fields such as name, year (integer), rating (integer, with a note that floats work for fractional scores), grape, color, country, and a descriptive text used for semantic search. The retriever is configured with metadata schema information so the LLM knows which fields are strings, integers, or other types, and which field contains the free-text description.
Once wired up, the system supports mixed queries that combine semantic and structured constraints. Queries like “fruity notes and rating above 97” trigger both a semantic match on the description and a numeric comparison on the rating field. Asking for “wines from Italy” can produce a purely structured filter with no semantic component. More complex requests—such as “earthy wines between 2015 and 2020”—generate composite conditions using logical operators (e.g., AND) and range comparisons on the year field.
The retriever also handles practical controls like result limits. If a user asks for “two wines,” the LLM can generate a limit parameter so the system returns only a small subset. It can even correct messy inputs: capitalization differences and minor misspellings still lead to valid metadata filters (e.g., mapping “Australia or New Zealand” into country equality constraints).
Overall, self-querying retrieval turns RAG from “semantic search everywhere” into a hybrid retrieval strategy: semantic search where meaning is needed, metadata filtering where structure exists, and an LLM-generated bridge that translates natural language into both.
Cornell Notes
Self-querying retrieval fixes a common RAG failure mode: using vector semantic search for fields that should be treated as exact database attributes. In LangChain, an LLM rewrites a user’s question into structured metadata filters (e.g., color == red, country == Italy, rating > 97, year between 2015 and 2020) plus an optional semantic target (typically a description field) for vector search. This hybrid approach lets queries combine meaning-based matching (“fruity notes”) with deterministic constraints (“rating above 97”). It also supports operational needs like limiting results and tolerating capitalization or minor spelling issues. The result is more accurate retrieval than “semantic search only,” especially for numeric and categorical fields.
Why is semantic search a poor default for fields like year or artist name?
How does self-querying retrieval split a natural-language query into semantic vs. structured parts?
What metadata schema details does LangChain need for this to work well?
What kinds of queries can combine semantic matching with metadata filtering?
How does the system handle result limits and messy user input?
Review Questions
- When would you choose metadata filtering over semantic search in a RAG system, and what failure mode does this prevent?
- In self-querying retrieval, what information must be provided about metadata fields so the LLM can generate correct filters?
- Give an example of a query that requires both semantic matching and structured constraints, and specify which fields would be used for each.
Key Points
- 1
Use semantic search only for fields where meaning matters (free-text descriptions), not for exact categorical or numeric attributes like year or artist name.
- 2
Self-querying retrieval inserts an LLM step that converts a user question into both a semantic vector-search target and structured metadata filters.
- 3
LangChain’s self querying retriever requires a metadata schema (field names and types) so the LLM can generate correct operators like equality and numeric comparisons.
- 4
Hybrid queries work well: semantic conditions (e.g., “fruity notes”) can be combined with structured constraints (e.g., rating > 97, year ranges).
- 5
Result limits can be generated by the LLM (e.g., “two wines”) to avoid returning overly large result sets.
- 6
Good language models can correct minor input issues like capitalization and small misspellings when mapping to metadata values.