Get AI summaries of any video or article — Sign up free
Use DeepSeek-R1 to Chat with Your Files Privately: 100% Local AI Assistant with Ollama thumbnail

Use DeepSeek-R1 to Chat with Your Files Privately: 100% Local AI Assistant with Ollama

Venelin Valkov·
5 min read

Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Run a private “chat with your files” assistant locally by deploying DeepSeek-R1 (14B distilled) with Ollama and connecting it to a Streamlit UI.

Briefing

A fully local “chat with your files” assistant is now practical by combining DeepSeek-R1 running locally with a lightweight app that ingests PDFs and Markdown, then answers questions using that file content as context. The core payoff: private documents never need to be uploaded to a hosted AI service—everything runs through Ollama on the user’s machine, while a Streamlit interface handles the conversation experience.

The build centers on a distilled DeepSeek R1 model—specifically the 14 billion “disel” version—deployed via Ollama. The assistant is designed to read a local Data directory, extract text from .txt and .md files, and convert multi-page PDFs into a single text string. That extracted content is packaged into a structured context prompt so the model can ground its answers in the user’s own documents.

On the backend, the project uses a small set of components: a file loader that scans the Data folder using glob patterns, a PDF extraction step using a Python PDF library (described as fast and effective for simpler PDFs), and a chatbot module that manages conversation state. When the user asks a question, the system streams the model’s output chunk-by-chunk rather than waiting for a full response. Because DeepSeek-R1 is treated as a “thinking” model, the implementation optionally strips out the model’s internal <think> content before adding the final text to chat history—aimed at improving responsiveness and keeping the UI cleaner.

The conversation prompt is assembled through a “start message” that includes the file contents. Each file is wrapped in an XML-like template containing the file name and its text. This matters because it gives the model clear boundaries for what each document contains, reducing ambiguity when multiple files are present. The assistant then continues the dialogue by sending the accumulated message history—roles labeled as user or assistant—into Ollama’s chat endpoint, with temperature controlled by a configuration file.

The front end is a Streamlit app that ties everything together. It loads the files at startup, creates the initial chat history using the file-based system context, and renders messages with simple UI choices (including different avatars for user vs. assistant). Users can drop new documents into the Data directory and restart; the assistant automatically incorporates them into the next session.

In testing, the assistant correctly answers questions grounded in the resume and the Markdown excerpt—for example, identifying a GitHub username from the resume and confirming whether a YouTube channel exists. When asked about “Agents,” the response shifts to definitions found in the Markdown excerpt, demonstrating that the context packaging steers answers toward the relevant source document.

Overall, the approach turns local model inference into a usable product: file ingestion (Markdown/PDF), context construction, streaming chat, and a simple UI—so private documents can be queried directly on-device without external API calls.

Cornell Notes

The assistant is built to run locally by pairing DeepSeek-R1 (14B distilled) with Ollama and a Streamlit UI. It loads files from a local Data directory, extracts text from .txt/.md and multi-page PDFs, and wraps each file’s content in an XML-like template. For every user question, it sends the question plus the full message history (including the file-based context) to Ollama’s chat endpoint, streaming the response as it’s generated. Because DeepSeek-R1 can output internal reasoning tags, the implementation can remove <think> content before saving the assistant’s reply to chat history. The result is a private “chat with your files” workflow where documents stay on the user’s machine.

How does the system turn local files into model-ready context?

It scans the Data directory for supported patterns: text/Markdown files and PDFs. Text and Markdown are read directly as strings. PDFs are processed page-by-page using a Python PDF extraction library, then concatenated into one combined string so multi-page documents become a single context block. Each file’s name and extracted content are then inserted into an XML-like template so the model can distinguish between documents.

What role does streaming play in the chatbot experience?

Instead of waiting for a complete model response, the chatbot iterates over streamed chunks from Ollama. As each chunk arrives, it appends the chunk content to the accumulating response. This reduces perceived latency and makes the UI feel more responsive, especially for longer answers.

Why remove the <think> portion from DeepSeek-R1 outputs?

DeepSeek-R1 is treated as a “thinking” model that may emit internal reasoning wrapped in <think> tags. The implementation optionally strips the <think> content before adding the assistant’s message to chat history. The stated goal is cleaner output in the chat UI and improved speed of the conversation flow.

How does the assistant decide what to answer when multiple files are present?

It doesn’t “choose” files explicitly at runtime; instead, it provides all loaded files in the initial system context. The file contents are clearly labeled with file names inside the XML-like template. When the user asks a question (e.g., “What is the definition of Agents?”), the model tends to ground the answer in the portion of context that matches the query—shifting from resume facts to definitions found in the Markdown excerpt.

What does the Streamlit layer contribute beyond the model logic?

Streamlit handles the user-facing interface: it loads files at startup, creates the initial chat history using the file context and a welcome message, and renders the conversation. It also shows which files are currently loaded in a sidebar, and formats user/assistant messages with different avatars for readability.

Review Questions

  1. If you add a new PDF to the Data directory, what changes are required to make the assistant use it in the next session?
  2. Where in the pipeline would you modify behavior if you wanted the assistant to keep or display the model’s <think> reasoning tags?
  3. How does the XML-like file templating influence the model’s ability to answer questions grounded in specific documents?

Key Points

  1. 1

    Run a private “chat with your files” assistant locally by deploying DeepSeek-R1 (14B distilled) with Ollama and connecting it to a Streamlit UI.

  2. 2

    Ingest documents by loading .txt/.md as raw text and converting multi-page PDFs into a single concatenated string for context.

  3. 3

    Build a structured context prompt that wraps each file’s name and content in an XML-like template to reduce ambiguity across multiple documents.

  4. 4

    Use Ollama chat streaming so responses appear incrementally rather than after the full generation completes.

  5. 5

    Optionally strip DeepSeek-R1 <think> tags before storing assistant replies to keep the chat history clean and responsive.

  6. 6

    Maintain conversation state by sending the full message history (user/assistant roles plus initial file-based system context) into each new model call.

  7. 7

    Add or change documents by updating the local Data directory; the assistant incorporates them on the next app run.

Highlights

The assistant keeps documents private by running DeepSeek-R1 locally through Ollama and using the file contents as prompt context—no upload workflow is described.
Multi-page PDFs are converted into a single text string by extracting each page and concatenating the results.
Streaming chunk-by-chunk output makes the chat feel faster, while optional removal of <think> tags cleans up what users see.
File context is packaged as labeled XML-like blocks (file name + content), helping answers track the right document when multiple sources are loaded.

Topics

Mentioned