Use DeepSeek-R1 to Chat with Your Files Privately: 100% Local AI Assistant with Ollama
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Run a private “chat with your files” assistant locally by deploying DeepSeek-R1 (14B distilled) with Ollama and connecting it to a Streamlit UI.
Briefing
A fully local “chat with your files” assistant is now practical by combining DeepSeek-R1 running locally with a lightweight app that ingests PDFs and Markdown, then answers questions using that file content as context. The core payoff: private documents never need to be uploaded to a hosted AI service—everything runs through Ollama on the user’s machine, while a Streamlit interface handles the conversation experience.
The build centers on a distilled DeepSeek R1 model—specifically the 14 billion “disel” version—deployed via Ollama. The assistant is designed to read a local Data directory, extract text from .txt and .md files, and convert multi-page PDFs into a single text string. That extracted content is packaged into a structured context prompt so the model can ground its answers in the user’s own documents.
On the backend, the project uses a small set of components: a file loader that scans the Data folder using glob patterns, a PDF extraction step using a Python PDF library (described as fast and effective for simpler PDFs), and a chatbot module that manages conversation state. When the user asks a question, the system streams the model’s output chunk-by-chunk rather than waiting for a full response. Because DeepSeek-R1 is treated as a “thinking” model, the implementation optionally strips out the model’s internal <think> content before adding the final text to chat history—aimed at improving responsiveness and keeping the UI cleaner.
The conversation prompt is assembled through a “start message” that includes the file contents. Each file is wrapped in an XML-like template containing the file name and its text. This matters because it gives the model clear boundaries for what each document contains, reducing ambiguity when multiple files are present. The assistant then continues the dialogue by sending the accumulated message history—roles labeled as user or assistant—into Ollama’s chat endpoint, with temperature controlled by a configuration file.
The front end is a Streamlit app that ties everything together. It loads the files at startup, creates the initial chat history using the file-based system context, and renders messages with simple UI choices (including different avatars for user vs. assistant). Users can drop new documents into the Data directory and restart; the assistant automatically incorporates them into the next session.
In testing, the assistant correctly answers questions grounded in the resume and the Markdown excerpt—for example, identifying a GitHub username from the resume and confirming whether a YouTube channel exists. When asked about “Agents,” the response shifts to definitions found in the Markdown excerpt, demonstrating that the context packaging steers answers toward the relevant source document.
Overall, the approach turns local model inference into a usable product: file ingestion (Markdown/PDF), context construction, streaming chat, and a simple UI—so private documents can be queried directly on-device without external API calls.
Cornell Notes
The assistant is built to run locally by pairing DeepSeek-R1 (14B distilled) with Ollama and a Streamlit UI. It loads files from a local Data directory, extracts text from .txt/.md and multi-page PDFs, and wraps each file’s content in an XML-like template. For every user question, it sends the question plus the full message history (including the file-based context) to Ollama’s chat endpoint, streaming the response as it’s generated. Because DeepSeek-R1 can output internal reasoning tags, the implementation can remove <think> content before saving the assistant’s reply to chat history. The result is a private “chat with your files” workflow where documents stay on the user’s machine.
How does the system turn local files into model-ready context?
What role does streaming play in the chatbot experience?
Why remove the <think> portion from DeepSeek-R1 outputs?
How does the assistant decide what to answer when multiple files are present?
What does the Streamlit layer contribute beyond the model logic?
Review Questions
- If you add a new PDF to the Data directory, what changes are required to make the assistant use it in the next session?
- Where in the pipeline would you modify behavior if you wanted the assistant to keep or display the model’s <think> reasoning tags?
- How does the XML-like file templating influence the model’s ability to answer questions grounded in specific documents?
Key Points
- 1
Run a private “chat with your files” assistant locally by deploying DeepSeek-R1 (14B distilled) with Ollama and connecting it to a Streamlit UI.
- 2
Ingest documents by loading .txt/.md as raw text and converting multi-page PDFs into a single concatenated string for context.
- 3
Build a structured context prompt that wraps each file’s name and content in an XML-like template to reduce ambiguity across multiple documents.
- 4
Use Ollama chat streaming so responses appear incrementally rather than after the full generation completes.
- 5
Optionally strip DeepSeek-R1 <think> tags before storing assistant replies to keep the chat history clean and responsive.
- 6
Maintain conversation state by sending the full message history (user/assistant roles plus initial file-based system context) into each new model call.
- 7
Add or change documents by updating the local Data directory; the assistant incorporates them on the next app run.