Vectorless RAG Tutorial With PageIndex-No VectorDB And Chunking Required

TL;DR

Vectorless RAG eliminates embeddings and vector databases by building an LLM tree and a JSON tree index from the PDF’s section hierarchy.

Briefing Cornell Notes

Briefing

Vectorless RAG replaces the usual “chunk → embed → store in a vector database → similarity search” pipeline with a document-structure index that an LLM can navigate directly. Instead of building embeddings and running cosine similarity over a vector store, PageIndex builds an LLM tree from a PDF, stores section-level summaries in a JSON tree index, and then uses an LLM to reason over that hierarchy to retrieve the most relevant sections—complete with page citations.

In traditional vector RAG, long PDFs are split into chunks, each chunk is embedded using an embedding model (OpenAI, Gemini, etc.), and the resulting vectors are stored in a vector database. At query time, the user question is embedded, the system performs similarity search to fetch the top matching chunks, and the retrieved text becomes context for the LLM to generate an answer.

Vectorless RAG removes the vector database entirely. The core mechanism is an “LLM tree builder” that turns a document into a hierarchy of sections (for example, introduction → AI → machine learning → deep learning), where each node contains an LLM-generated summary of the content for that section. The output is a JSON tree index representing the document’s structure and node summaries. When a user asks a question, the LLM receives this JSON tree index as context, traverses the hierarchy, selects the nodes most likely to contain the answer, and then extracts the corresponding section content to produce a response. The retrieval loop is explicit: the system reasons about which nodes to use, checks whether the extracted content is sufficient, and if not, it iterates by selecting additional nodes.

A key detail is how the system handles documents with or without a table of contents (TOC). If a TOC exists, it can use it as an index of sections and page numbers. If no TOC is available, the system performs TOC detection by scanning for headers on end pages; when that fails, the LLM reads pages to infer headings and structure. Retrieval then becomes “section-aware splitting” rather than token-count chunking. The result is that the LLM can retrieve logically bounded sections (e.g., the deep learning section) instead of arbitrary text windows that may mix topics.

The tutorial demonstrates this using the PageIndex library and a hosted chat interface (chat.pageindex.ai). A sample PDF is uploaded, the tree index is built asynchronously (roughly 30–90 seconds for a ~50-page document), and the resulting tree is inspected as JSON with node IDs, titles, page ranges, and node summaries. In the example query—“What is the syllabus covered in modern LLM fine-tuning?”—the LLM tree search identifies the relevant node IDs (such as nodes corresponding to the fine-tuning sections), extracts the associated content, and then generates an answer with citations formatted as section title and page number.

Finally, the walkthrough provides end-to-end Python-style steps: load environment variables (PageIndex API key and OpenAI API key), upload the PDF, poll for tree readiness, fetch the tree structure, run an LLM tree search that selects nodes via a prompt, and generate the final answer using only the retrieved context. The practical takeaway is a lower setup burden—no vector database, no embedding pipeline—at the cost of relying on structured indexing and LLM reasoning over the document hierarchy.

Cornell Notes

Vectorless RAG builds a navigable document structure instead of embeddings. PageIndex turns a PDF into an LLM tree, where each node represents a section (with node IDs, titles, page ranges, and LLM-generated summaries) stored as a JSON tree index. At question time, an LLM receives the JSON tree index, reasons over it to select the most relevant nodes, extracts the corresponding section content, and generates an answer with citations (section title and page number). If a TOC exists, structure comes from it; if not, the system detects TOC via headers or infers headings and structure by reading pages. This approach avoids vector databases and similarity search, relying on section-aware retrieval and hierarchical reasoning.

How does vectorless RAG avoid the embedding + vector database pipeline used in traditional RAG?

Traditional vector RAG requires chunking, embedding each chunk, storing vectors in a vector database, then embedding the query and running similarity search to retrieve top-k chunks. Vectorless RAG skips embeddings and vector storage entirely. PageIndex builds an LLM tree from the PDF and outputs a JSON tree index where each node contains a summary of a logical section. The LLM then traverses and selects nodes directly from that JSON structure, extracts the relevant section content, and uses it as context to answer the question.

What exactly is stored in the JSON tree index, and why does that help retrieval?

The JSON tree index represents a hierarchy of document sections (e.g., preface → module → subsection). Each node includes a node ID, a title, page range information, and an LLM-generated summary of the content within that section. Because the LLM already has section-level summaries and structure, it can reason about which nodes likely contain the answer, then retrieve the exact sections rather than relying on approximate semantic matches from vector similarity.

How does the system handle PDFs that lack a table of contents (TOC)?

When TOC is present, the system can use it to map sections to page numbers. When TOC is missing, it first attempts TOC detection by scanning end pages for existing headers. If no usable TOC structure is found, the LLM reads pages and infers headings and document structure. This enables section-aware splitting based on logical boundaries (headings/sections) rather than token-count chunking.

What does “LLM tree search” do during retrieval?

LLM tree search takes the user query plus the JSON tree index and uses an LLM prompt to identify which nodes most likely contain the answer. It returns a “thinking” output along with a node list (node IDs). The pipeline then extracts section content from those selected nodes. If the extracted content is insufficient, the process loops back to select additional nodes before generating the final answer.

How are citations produced in the vectorless RAG workflow?

The answer generation prompt instructs the LLM to use only the provided context and to cite each claim with the section title and page number. During generation, the system appends context in a structured form (node title, page index/range, and section content), enabling citations like “(section title, page number)” in the final response.

What are the practical steps shown for building and querying a vectorless index?

The workflow shown is: (1) load PageIndex and OpenAI API keys from environment variables, (2) upload a PDF and capture the returned document ID, (3) poll until the tree index is ready (asynchronous build, ~30–90 seconds for ~50 pages), (4) fetch the tree structure as JSON (optionally with node summaries), and (5) run LLM tree search for a query, then generate the final answer using only the retrieved node context.

Review Questions

In traditional vector RAG, which two embedding steps occur (and what do they enable)?
When a PDF has no TOC, what mechanisms does the system use to infer document structure before building the JSON tree index?
During LLM tree search, what triggers the retrieval loop to select additional nodes?

Key Points

1
Vectorless RAG eliminates embeddings and vector databases by building an LLM tree and a JSON tree index from the PDF’s section hierarchy.
2
PageIndex generates node-level summaries for each logical section so the LLM can reason over structure and select relevant nodes at query time.
3
Retrieval becomes section-aware: it uses headings/TOC-derived structure (or inferred structure) rather than token-count chunking.
4
LLM tree search selects node IDs, extracts section content from those nodes, and can iterate if the extracted context is insufficient.
5
The final answer is generated using only retrieved context and includes citations tied to section titles and page numbers.
6
The end-to-end workflow includes uploading a PDF, waiting for asynchronous tree indexing, inspecting the JSON tree, then running LLM tree search and answer generation.

Highlights

Vectorless RAG replaces similarity search over embeddings with an LLM that traverses a JSON tree index built from document structure.

When TOC is missing, the system infers headings and structure by reading pages, enabling logical-boundary retrieval instead of token-window chunking.

LLM tree search returns a node list (node IDs), extracts the corresponding sections, and generates answers with section-title and page-number citations.

For a ~50-page PDF, tree indexing is asynchronous and typically completes in about 30–90 seconds in the demonstrated workflow.

Topics

Vectorless RAG
PageIndex
LLM Tree Search
Section-Aware Retrieval
JSON Tree Index

Mentioned

PageIndex
OpenAI
Krish Naik
RAG
TOC
LLM
JSON
PDF