Build Private Chatbot wtih LangChain, Ollama and Qwen 2.5 | Local AI App with Private LLM
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Build a local chatbot by pairing Ollama (for Qwen 2.5) with LangChain/LangGraph (for prompt assembly and conversation flow).
Briefing
A fully local “private chatbot” workflow can be built by combining LangChain’s message orchestration (via LangGraph), Ollama for on-device model serving, and Qwen 2.5 for the actual language reasoning—then grounding every answer in a filtered set of mobile app reviews stored in a local SQLite database. The practical payoff is immediate: the app can answer questions like “What are the top three most common issues users have?” by analyzing only the reviews that match user-specified filters (package, rating range, and review count), and it can keep conversational context so follow-up prompts build on earlier questions.
The demo starts with a simple command to run the app (filtering reviews with ratings of 1–2). A welcome screen appears, then the chatbot analyzes the selected reviews in real time on an M3 Pro machine. One example question asks for the top three recurring problems users report; the model returns a ranked list derived from the review text. A second prompt then asks for an implementation suggestion—summarized in two to three sentences—for a notification fix tied to the previously identified problem. That follow-up works because the system maintains chat history and reuses it when constructing the next model call.
Under the hood, the project uses UV as a fast package manager to install dependencies, then relies on Ollama to download and run the Qwen 2.5 model locally (the walkthrough uses the 14B parameter variant, quantized to 4-bit by default in the Ollama setup). The core application is a single main file (app.py) that wires together: (1) a review data model (package name, review text, rating), (2) a SQLite query that fetches only relevant reviews, and (3) a LangGraph state machine that manages conversation state.
The SQLite layer filters reviews by rating bounds (ratings are treated as 0–5 inclusive), excludes empty review text, and limits the number of rows to a maximum count. Each review is formatted into a structured XML-like block (review text plus rating) and inserted into a system prompt. That prompt instructs the model to be helpful, accurate, and brief, and it frames the task as a “mobile app review analyzer.” LangChain’s chat prompt template then builds the message list by combining the system prompt content with the evolving chat history.
Model calls are made through a LangChain chat model wrapper pointed at Ollama, with temperature set to 0 for consistent outputs. The LangGraph “cyclic” workflow repeatedly invokes the model whenever the user submits a new message, appending the model’s response back into the in-memory message history. For this prototype, checkpointing is kept in RAM via a memory saver; the walkthrough notes that production deployments would likely persist checkpoints elsewhere.
The result is a compact (about 146 lines) local app that streams responses to a console UI using the Rich library, validates CLI arguments (min/max rating, max reviews), and loops indefinitely for interactive Q&A. The next steps hinted include adding a web interface, exposing an API, and extending the review-grounded chatbot into an agentic system using a newer framework (PydanticAI).
Cornell Notes
The walkthrough builds a fully local chatbot that answers questions using only on-device resources: Ollama serves Qwen 2.5, while LangChain and LangGraph manage prompt construction and conversation flow. Every response is grounded in a filtered set of mobile app reviews pulled from a local SQLite database using rating bounds and a maximum review count. Reviews are inserted into a structured XML-like format inside a system prompt, giving the model clear boundaries for where review content begins and ends. Chat history is preserved in memory so follow-up questions can reference prior answers. The app streams output in a console UI and keeps the model deterministic with temperature set to 0.
How does the app ensure answers stay grounded in specific review data rather than general knowledge?
What role do LangGraph and chat history play in follow-up questions?
Why is temperature set to 0, and what effect does that have on the chatbot’s behavior?
How are model serving and model selection handled locally?
What does the SQLite query filter out, and how does that shape the analysis?
How does the prototype handle persistence of chat state?
Review Questions
- If you wanted the chatbot to focus on only one specific app’s reviews, what parameters would you change and where would that filtering happen?
- Describe the full path from a user question to the model call: which components assemble the prompt and which components store the evolving messages?
- What trade-offs come from limiting the maximum number of reviews, and how might that affect the quality of “top common issues” answers?
Key Points
- 1
Build a local chatbot by pairing Ollama (for Qwen 2.5) with LangChain/LangGraph (for prompt assembly and conversation flow).
- 2
Ground every answer in a filtered SQLite dataset of mobile app reviews using rating bounds, non-empty review text, and a maximum review count.
- 3
Insert reviews into a structured XML-like format inside the system prompt to clearly delimit review content for the model.
- 4
Maintain conversational context by storing message history in LangGraph state and passing it into each subsequent model invocation.
- 5
Use temperature = 0 to make outputs consistent for the same conversation state and inputs.
- 6
Stream model responses in the console UI using the Rich library for a more interactive experience.
- 7
Keep prototype checkpointing in RAM, but plan for persistent storage if deploying beyond local experimentation.