6-Building Advanced RAG Q&A Project With Multiple Data Sources With Langchain

TL;DR

Wrap each knowledge source as a LangChain tool so an LLM can call it consistently during Q&A.

Briefing Cornell Notes

Briefing

A multi-source RAG Q&A setup becomes practical by combining LangChain “tools” with an agent that can route questions to the right retrieval backend. Instead of forcing a single knowledge source, the workflow builds separate retrieval tools for Wikipedia, a research-paper repository (referred to as “RVE/RF” in the transcript), and a LangSmith documentation search. An agent then decides which tool to call—first trying Wikipedia, then falling back to the research repository—so a single chat interface can answer questions across different domains.

The build starts with LangChain tools as the integration layer. Tools are described as interfaces an LLM can use to interact with external systems. The transcript lists many built-in options (including search and finance-related tools), but the implementation focuses on three retrieval-oriented tools. For Wikipedia, it uses LangChain’s Wikipedia query runner and Wikipedia API wrapper to create a configurable “top-k” document retriever, with a character limit (e.g., 200 characters) to control how much context gets returned.

For custom content, the process shifts to document loading, chunking, and vector indexing. A web page is fetched using a web-based loader, then split into overlapping chunks using a recursive character text splitter (example settings include chunk size around 1000 characters with overlap around 200). Those chunks are embedded with OpenAI embeddings and stored in a vector database. The vector store is then converted into a retriever interface, and wrapped into a LangChain retrieval tool using create_retriever_tool so it can be invoked by the agent with a natural-language instruction like “search for information about LangSmith.”

The agent layer is where routing happens. An OpenAI tools agent is created with the combined tool list (Wikipedia tool, the custom/research retriever tool, and the LangSmith retrieval tool). A prompt is pulled from LangChain Hub (after installing the required LangChain Hub dependency), and an AgentExecutor is used to run the system end-to-end. With agent_executor.invoke, the same user question triggers tool selection and retrieval: a query like “tell me about LangSmith” routes to the LangSmith search tool, while a broader question like “tell me about machine learning” can route to Wikipedia, and a research-paper-related prompt routes to the custom repository retriever.

The practical takeaway is that “advanced RAG” here isn’t a new model—it’s orchestration. By wrapping each data source as a tool and letting an agent choose the sequence of tool calls, the system delivers multi-source Q&A without hardcoding one retrieval path. The transcript also emphasizes debugging via verbose execution: enabling verbose output reveals which tool the agent invoked and helps validate the routing logic.

Cornell Notes

The project builds a multi-source RAG Q&A system by turning each knowledge backend into a LangChain “tool” and using an agent to route questions to the right tool. Wikipedia is wrapped using LangChain’s Wikipedia query runner/API wrapper with configurable top-k and context length. A custom web/research source is loaded, chunked with overlap, embedded using OpenAI embeddings, stored in a vector database, and converted into a retriever tool via create_retriever_tool. A combined tool list is passed into an OpenAI tools agent, with a prompt sourced from LangChain Hub. AgentExecutor then runs the pipeline so queries like “LangSmith” hit the LangSmith retriever, while other questions can fall back to Wikipedia or the custom repository.

What does LangChain mean by “tools,” and why are they central to multi-source RAG?

Tools are interfaces that an LLM/agent can call to interact with external systems. In this build, each data source (Wikipedia, a custom web/research corpus, and LangSmith documentation search) is wrapped as a tool. That wrapping standardizes how the agent requests information, enabling one chat flow to query multiple backends without rewriting retrieval logic for each source.

How is the Wikipedia retrieval tool constructed in the transcript?

Wikipedia retrieval uses LangChain community components: Wikipedia query runner and Wikipedia API wrapper. The wrapper is configured with parameters like top-k results (example: 1) and a max character limit for returned content (example: 200 characters). The resulting Wikipedia tool can be called by name (e.g., “Wikipedia”) and returns context suitable for Q&A.

What are the key steps to convert a custom web page into a retriever tool?

First, fetch content with a web-based loader using a URL. Next, split the loaded text into chunks using a recursive character text splitter, using chunk size and overlap (example: chunk size ~1000 and overlap ~200). Then embed the chunks with OpenAI embeddings and store them in a vector database. Finally, convert the vector store into a retriever and wrap it into a tool using create_retriever_tool so the agent can search it with a natural-language instruction.

How does the agent decide which source to query?

The agent is created with a combined list of tools and a prompt that guides behavior. When agent_executor.invoke receives a user question, the agent selects a sequence of tool actions based on the question. The transcript describes a practical routing pattern: try Wikipedia first; if needed, use the research repository retriever; and for LangSmith-specific questions, use the LangSmith retrieval tool.

Why does AgentExecutor matter, and what does verbose output help with?

AgentExecutor is the runtime component that executes the agent’s chosen tool calls. Setting verbose=True helps reveal the internal routing—e.g., logs like “invoking Wikipedia with machine learning” or “invoking the RF tool with query …”. This makes it easier to debug whether the agent is hitting the intended retrieval backend.

Review Questions

If a user asks a question that could be answered by both Wikipedia and the custom PDF/web corpus, what mechanisms in this design influence which tool gets called?
Where in the pipeline do chunk size and chunk overlap affect retrieval quality, and how would you expect changing them to impact answers?
What is the role of create_retriever_tool compared with directly using a vector store retriever?

Key Points

1
Wrap each knowledge source as a LangChain tool so an LLM can call it consistently during Q&A.
2
Use Wikipedia query runner/API wrapper to create a configurable Wikipedia retrieval tool with top-k and context-length limits.
3
For custom sources, load content, split into overlapping chunks, embed with OpenAI embeddings, store in a vector database, then convert to a retriever.
4
Use create_retriever_tool to expose a retriever as an agent-callable tool with an instruction prompt.
5
Combine multiple tools into a single tool list and let an OpenAI tools agent route questions to the most relevant backend.
6
Run everything through AgentExecutor and enable verbose output to verify which tool was invoked for each query.

Highlights

Multi-source RAG is achieved by routing questions across separate retrieval tools (Wikipedia, custom vector retriever, and LangSmith search) rather than relying on one index.

Chunking with overlap (e.g., ~1000 chunk size with ~200 overlap) is used to preserve context continuity for vector retrieval.

AgentExecutor with verbose=True provides the practical debugging view of which retrieval tool the agent actually invoked.

Topics

Multi-Source RAG
LangChain Tools
Agents Routing
Vector Retrieval
AgentExecutor

Mentioned

Krish Naik
RAG
LLM
API