Vectorless RAG Tutorial With PageIndex-No VectorDB And Chunking Required
Based on Krish Naik's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Vectorless RAG eliminates embeddings and vector databases by building an LLM tree and a JSON tree index from the PDF’s section hierarchy.
Briefing
Vectorless RAG replaces the usual “chunk → embed → store in a vector database → similarity search” pipeline with a document-structure index that an LLM can navigate directly. Instead of building embeddings and running cosine similarity over a vector store, PageIndex builds an LLM tree from a PDF, stores section-level summaries in a JSON tree index, and then uses an LLM to reason over that hierarchy to retrieve the most relevant sections—complete with page citations.
In traditional vector RAG, long PDFs are split into chunks, each chunk is embedded using an embedding model (OpenAI, Gemini, etc.), and the resulting vectors are stored in a vector database. At query time, the user question is embedded, the system performs similarity search to fetch the top matching chunks, and the retrieved text becomes context for the LLM to generate an answer.
Vectorless RAG removes the vector database entirely. The core mechanism is an “LLM tree builder” that turns a document into a hierarchy of sections (for example, introduction → AI → machine learning → deep learning), where each node contains an LLM-generated summary of the content for that section. The output is a JSON tree index representing the document’s structure and node summaries. When a user asks a question, the LLM receives this JSON tree index as context, traverses the hierarchy, selects the nodes most likely to contain the answer, and then extracts the corresponding section content to produce a response. The retrieval loop is explicit: the system reasons about which nodes to use, checks whether the extracted content is sufficient, and if not, it iterates by selecting additional nodes.
A key detail is how the system handles documents with or without a table of contents (TOC). If a TOC exists, it can use it as an index of sections and page numbers. If no TOC is available, the system performs TOC detection by scanning for headers on end pages; when that fails, the LLM reads pages to infer headings and structure. Retrieval then becomes “section-aware splitting” rather than token-count chunking. The result is that the LLM can retrieve logically bounded sections (e.g., the deep learning section) instead of arbitrary text windows that may mix topics.
The tutorial demonstrates this using the PageIndex library and a hosted chat interface (chat.pageindex.ai). A sample PDF is uploaded, the tree index is built asynchronously (roughly 30–90 seconds for a ~50-page document), and the resulting tree is inspected as JSON with node IDs, titles, page ranges, and node summaries. In the example query—“What is the syllabus covered in modern LLM fine-tuning?”—the LLM tree search identifies the relevant node IDs (such as nodes corresponding to the fine-tuning sections), extracts the associated content, and then generates an answer with citations formatted as section title and page number.
Finally, the walkthrough provides end-to-end Python-style steps: load environment variables (PageIndex API key and OpenAI API key), upload the PDF, poll for tree readiness, fetch the tree structure, run an LLM tree search that selects nodes via a prompt, and generate the final answer using only the retrieved context. The practical takeaway is a lower setup burden—no vector database, no embedding pipeline—at the cost of relying on structured indexing and LLM reasoning over the document hierarchy.
Cornell Notes
Vectorless RAG builds a navigable document structure instead of embeddings. PageIndex turns a PDF into an LLM tree, where each node represents a section (with node IDs, titles, page ranges, and LLM-generated summaries) stored as a JSON tree index. At question time, an LLM receives the JSON tree index, reasons over it to select the most relevant nodes, extracts the corresponding section content, and generates an answer with citations (section title and page number). If a TOC exists, structure comes from it; if not, the system detects TOC via headers or infers headings and structure by reading pages. This approach avoids vector databases and similarity search, relying on section-aware retrieval and hierarchical reasoning.
How does vectorless RAG avoid the embedding + vector database pipeline used in traditional RAG?
What exactly is stored in the JSON tree index, and why does that help retrieval?
How does the system handle PDFs that lack a table of contents (TOC)?
What does “LLM tree search” do during retrieval?
How are citations produced in the vectorless RAG workflow?
What are the practical steps shown for building and querying a vectorless index?
Review Questions
- In traditional vector RAG, which two embedding steps occur (and what do they enable)?
- When a PDF has no TOC, what mechanisms does the system use to infer document structure before building the JSON tree index?
- During LLM tree search, what triggers the retrieval loop to select additional nodes?
Key Points
- 1
Vectorless RAG eliminates embeddings and vector databases by building an LLM tree and a JSON tree index from the PDF’s section hierarchy.
- 2
PageIndex generates node-level summaries for each logical section so the LLM can reason over structure and select relevant nodes at query time.
- 3
Retrieval becomes section-aware: it uses headings/TOC-derived structure (or inferred structure) rather than token-count chunking.
- 4
LLM tree search selects node IDs, extracts section content from those nodes, and can iterate if the extracted context is insufficient.
- 5
The final answer is generated using only retrieved context and includes citations tied to section titles and page numbers.
- 6
The end-to-end workflow includes uploading a PDF, waiting for asynchronous tree indexing, inspecting the JSON tree, then running LLM tree search and answer generation.