Get AI summaries of any video or article — Sign up free
Build Private Chatbot wtih LangChain, Ollama and Qwen 2.5 | Local AI App with Private LLM thumbnail

Build Private Chatbot wtih LangChain, Ollama and Qwen 2.5 | Local AI App with Private LLM

Venelin Valkov·
5 min read

Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Build a local chatbot by pairing Ollama (for Qwen 2.5) with LangChain/LangGraph (for prompt assembly and conversation flow).

Briefing

A fully local “private chatbot” workflow can be built by combining LangChain’s message orchestration (via LangGraph), Ollama for on-device model serving, and Qwen 2.5 for the actual language reasoning—then grounding every answer in a filtered set of mobile app reviews stored in a local SQLite database. The practical payoff is immediate: the app can answer questions like “What are the top three most common issues users have?” by analyzing only the reviews that match user-specified filters (package, rating range, and review count), and it can keep conversational context so follow-up prompts build on earlier questions.

The demo starts with a simple command to run the app (filtering reviews with ratings of 1–2). A welcome screen appears, then the chatbot analyzes the selected reviews in real time on an M3 Pro machine. One example question asks for the top three recurring problems users report; the model returns a ranked list derived from the review text. A second prompt then asks for an implementation suggestion—summarized in two to three sentences—for a notification fix tied to the previously identified problem. That follow-up works because the system maintains chat history and reuses it when constructing the next model call.

Under the hood, the project uses UV as a fast package manager to install dependencies, then relies on Ollama to download and run the Qwen 2.5 model locally (the walkthrough uses the 14B parameter variant, quantized to 4-bit by default in the Ollama setup). The core application is a single main file (app.py) that wires together: (1) a review data model (package name, review text, rating), (2) a SQLite query that fetches only relevant reviews, and (3) a LangGraph state machine that manages conversation state.

The SQLite layer filters reviews by rating bounds (ratings are treated as 0–5 inclusive), excludes empty review text, and limits the number of rows to a maximum count. Each review is formatted into a structured XML-like block (review text plus rating) and inserted into a system prompt. That prompt instructs the model to be helpful, accurate, and brief, and it frames the task as a “mobile app review analyzer.” LangChain’s chat prompt template then builds the message list by combining the system prompt content with the evolving chat history.

Model calls are made through a LangChain chat model wrapper pointed at Ollama, with temperature set to 0 for consistent outputs. The LangGraph “cyclic” workflow repeatedly invokes the model whenever the user submits a new message, appending the model’s response back into the in-memory message history. For this prototype, checkpointing is kept in RAM via a memory saver; the walkthrough notes that production deployments would likely persist checkpoints elsewhere.

The result is a compact (about 146 lines) local app that streams responses to a console UI using the Rich library, validates CLI arguments (min/max rating, max reviews), and loops indefinitely for interactive Q&A. The next steps hinted include adding a web interface, exposing an API, and extending the review-grounded chatbot into an agentic system using a newer framework (PydanticAI).

Cornell Notes

The walkthrough builds a fully local chatbot that answers questions using only on-device resources: Ollama serves Qwen 2.5, while LangChain and LangGraph manage prompt construction and conversation flow. Every response is grounded in a filtered set of mobile app reviews pulled from a local SQLite database using rating bounds and a maximum review count. Reviews are inserted into a structured XML-like format inside a system prompt, giving the model clear boundaries for where review content begins and ends. Chat history is preserved in memory so follow-up questions can reference prior answers. The app streams output in a console UI and keeps the model deterministic with temperature set to 0.

How does the app ensure answers stay grounded in specific review data rather than general knowledge?

It fetches reviews from a local SQLite database using user-provided filters (package name, minimum/maximum rating, and a maximum number of reviews). Each selected review is formatted into a structured XML-like block containing the review text and rating, then injected into a system prompt. The prompt instructs the model to analyze those reviews and answer questions about them, with explicit boundaries around where the review content starts and ends.

What role do LangGraph and chat history play in follow-up questions?

LangGraph maintains an application state that includes both the selected review context and the evolving message history. When a user submits a new prompt, the workflow constructs a new message list that includes the system prompt plus prior chat messages. The model is then invoked with that combined context, so follow-ups can reference earlier outputs (for example, proposing a notification fix based on the previously identified top issue).

Why is temperature set to 0, and what effect does that have on the chatbot’s behavior?

Temperature controls randomness in generation. Setting temperature to 0 pushes the model toward deterministic, repeatable outputs for the same inputs and chat state. In the walkthrough, this is used to make repeated questions within the same conversation produce consistent responses.

How are model serving and model selection handled locally?

Ollama is used to run the model on the machine. The walkthrough instructs installing Ollama, then downloading the Qwen 2.5 model via Ollama. The selected model is the 14B parameter variant, quantized to 4-bit by default in the Ollama setup, enabling local inference without requiring a remote API.

What does the SQLite query filter out, and how does that shape the analysis?

The query restricts results to ratings within a specified inclusive range (ratings treated as 0–5) and excludes rows where the review text is empty (so the model isn’t fed blank content). It also limits the number of returned reviews to a maximum count, which caps how much text the model must analyze per question and keeps the app responsive.

How does the prototype handle persistence of chat state?

For this prototype, chat history is stored in RAM using a memory saver in the LangGraph workflow. That means the conversation state persists only while the app is running on that machine. The walkthrough notes that a production version would likely use a different checkpointing/persistence strategy (e.g., a database or file-based storage).

Review Questions

  1. If you wanted the chatbot to focus on only one specific app’s reviews, what parameters would you change and where would that filtering happen?
  2. Describe the full path from a user question to the model call: which components assemble the prompt and which components store the evolving messages?
  3. What trade-offs come from limiting the maximum number of reviews, and how might that affect the quality of “top common issues” answers?

Key Points

  1. 1

    Build a local chatbot by pairing Ollama (for Qwen 2.5) with LangChain/LangGraph (for prompt assembly and conversation flow).

  2. 2

    Ground every answer in a filtered SQLite dataset of mobile app reviews using rating bounds, non-empty review text, and a maximum review count.

  3. 3

    Insert reviews into a structured XML-like format inside the system prompt to clearly delimit review content for the model.

  4. 4

    Maintain conversational context by storing message history in LangGraph state and passing it into each subsequent model invocation.

  5. 5

    Use temperature = 0 to make outputs consistent for the same conversation state and inputs.

  6. 6

    Stream model responses in the console UI using the Rich library for a more interactive experience.

  7. 7

    Keep prototype checkpointing in RAM, but plan for persistent storage if deploying beyond local experimentation.

Highlights

The chatbot answers review questions by injecting only the selected SQLite reviews into a system prompt, with explicit XML-like boundaries around each review.
Follow-up prompts work because LangGraph preserves chat history in state and reuses it when constructing the next model call.
Running Qwen 2.5 locally is handled through Ollama, using the 14B model quantized to 4-bit by default in the setup described.

Topics

Mentioned