2-Building Multi Agentic AI RAG With Vector Database

TL;DR

Set the Gro API key explicitly to prevent unintended fallback to OpenAI models in multi-agent configurations.

Briefing Cornell Notes

Briefing

Agentic AI can be made to answer questions by pulling knowledge from a vector database that’s populated from PDFs—turning raw documents into a searchable “knowledge base” the assistant can query on demand. The core build here wires an assistant to a local PG Vector instance running in Docker, then loads PDF content (via a URL) into vector embeddings so the assistant can retrieve relevant passages and generate grounded responses.

The workflow starts with a practical fix: when using Gro-based setups, the code must explicitly provide a Gro API key rather than relying on defaults that may fall back to OpenAI. With that environment configuration in place, the project shifts to infrastructure—running PG Vector locally through Docker Desktop. Once the database is up, the system uses a “knowledge base” layer that accepts one or more PDF URLs, extracts text from those PDFs, converts the text into vector embeddings, and stores them in a named collection inside PG Vector.

From there, the assistant is created as a function (named consistently with the configuration) and connected to three capabilities: search the knowledge base, read chat history, and expose tools in responses. The assistant is configured with a run ID (initially none, then assigned after the first start), a user identifier, and the knowledge base object. Key toggles—such as enabling knowledge search and chat-history reading—allow the assistant to both retrieve relevant document chunks and maintain conversational context.

A concrete example uses a recipe PDF hosted on Amazon S3 (the URL points to a “recipes.pdf” document). After the program runs, it reports that documents have been added to the vector database. Then the assistant answers questions like ingredients and preparation steps—for example listing ingredient quantities (e.g., chicken, roasted peanuts) and returning directions for making specific dishes. The accuracy comes from retrieval: responses are generated using the most relevant chunks from the embedded PDF content stored in PG Vector.

Implementation details matter because the build depends on several libraries. The setup includes installing dependencies such as SQLAlchemy, PG Vector, psycopg binary, and PyPDF for PDF reading. The project also emphasizes that the same pattern can be repeated with other vector databases (the discussion names alternatives like Qdrant, Pinecone, LanceDB, ChromaDB, and SingleStore), but the walkthrough focuses on PG Vector and PG Vector 2.

The takeaway is less about a single chatbot and more about building an end-to-end agentic RAG pipeline: Docker-hosted vector storage → PDF ingestion via URL → embeddings + collection creation → assistant configured to search and respond. The assignment at the end pushes the same pipeline into a user-facing app using Streamlit, turning the backend workflow into an interactive front end for inputs and chat-style answers.

Cornell Notes

This build creates an agentic RAG system where an assistant answers questions by searching a vector database populated from PDFs. A local PG Vector instance runs in Docker, and a knowledge-base component ingests PDF URLs, extracts text, converts it into embeddings, and stores it in a named collection. The assistant is then configured with tools to (1) search the knowledge base, (2) read chat history, and (3) generate responses grounded in retrieved document chunks. A Gro API key is required to avoid defaulting to OpenAI models. The result is a chatbot that can answer questions like ingredients and directions from a recipe PDF, and the same pattern can be extended to other vector databases and wrapped in Streamlit.

Why does the code need a Gro API key instead of relying on defaults?

The walkthrough notes a common issue in multi-agent setups: even when using Gro-based components, missing model configuration can cause the system to fall back to OpenAI by default. The fix is to set the environment variable (e.g., GRO_API_KEY) and ensure the assistant/agent uses Gro’s models rather than OpenAI.

How does the system turn a PDF URL into something the assistant can search?

A knowledge-base object accepts PDF URLs. When the program runs, it reads the PDF from the URL, extracts the text, converts the text into vector embeddings, and loads those embeddings into PG Vector. Those embeddings are stored under a collection name (e.g., “recipes”) tied to the PG Vector DB URL.

What role does Docker play in this setup?

Docker Desktop hosts the PG Vector service locally. The walkthrough instructs running the PG Vector container so the database is reachable via a DB URL (including the exposed port like 5532). Once the container is running, the code uses that DB URL to create collections and store embeddings.

What does the assistant configuration enable for retrieval and conversation?

The assistant is created with settings that enable knowledge search and chat-history reading. Flags like search_knowledge and read_chat_history allow the assistant to query the vector database for relevant chunks and maintain context across turns. It also sets show_tools_in_output so tool usage can be visible in responses.

What libraries are required to make the pipeline work end-to-end?

The walkthrough highlights installing dependencies including SQLAlchemy, PG Vector, psycopg binary, and PyPDF. These support database connectivity, vector storage, PostgreSQL interaction, and PDF reading/text extraction.

How is the recipe example validated in practice?

After loading the recipe PDF into the vector database, the assistant answers targeted questions such as ingredients and how to make a dish. The returned ingredient quantities and preparation directions match content from the embedded PDF, demonstrating retrieval-grounded responses.

Review Questions

What steps are required to go from a PDF URL to a searchable vector collection in PG Vector?
Which assistant settings are necessary to enable knowledge-base retrieval and chat-history context?
How would you adapt the same pipeline if you swapped PG Vector for another vector database mentioned in the walkthrough?

Key Points

1
Set the Gro API key explicitly to prevent unintended fallback to OpenAI models in multi-agent configurations.
2
Run PG Vector locally using Docker Desktop and capture the correct DB URL (including the exposed port).
3
Create a knowledge base that ingests PDF URLs, extracts text, generates embeddings, and stores them in a named PG Vector collection.
4
Instantiate an assistant wired to the knowledge base with retrieval enabled and chat-history reading turned on.
5
Install required dependencies (SQLAlchemy, PG Vector, psycopg binary, PyPDF) to support database access and PDF ingestion.
6
Use targeted questions (e.g., ingredients, directions) to verify that answers are grounded in retrieved PDF chunks.
7
Extend the backend RAG pipeline into an end-to-end app by wrapping it with Streamlit for a front-end chat experience.

Highlights

The pipeline’s “secret sauce” is retrieval: PDF text becomes embeddings in PG Vector, and the assistant answers by searching those stored chunks.

Docker-hosted PG Vector turns local development into a realistic production-like vector database workflow.

Enabling knowledge search plus chat-history reading lets the assistant both ground answers and maintain conversational context.

A single recipe PDF URL can power a functional chatbot that returns ingredients and step-by-step directions from the document content.

Topics

Agentic AI RAG
Vector Database
PG Vector
PDF Ingestion
Assistant Tools

Mentioned

Krish Naik
RAG
LLM
API
DB URL
AWS
S3
PG