Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
LangChain memory works by injecting stored conversation context into the prompt for each new model call.
Briefing
LangChain memory turns a basic chatbot into a conversation that can remember what was said earlier—then choose how much to keep, how to compress it, and how to retrieve the most relevant details later. The practical payoff is a chatbot that can sustain context across turns (buffer memory), stay within context limits by summarizing (summary memory), and pull back specific facts using semantic search (vector store memory). The walkthrough builds these memory types step by step and then combines them into a working “Dwight-style” chatbot that can generate sales talk while tracking prior exchanges.
The foundation starts with LangChain’s chat message history, which stores alternating human and AI messages as structured objects (human message and AI message). A conversation buffer memory then converts that history into a compact “history” field that gets injected into the prompt for each new request. Using a conversation chain with a ChatGPT model (temperature set to 0) and an initially empty memory, the chatbot responds to a first question, then automatically appends the new user input and model output to memory. A follow-up question demonstrates the key behavior: the model can reference earlier context because the previous response is included in the next prompt.
To make memory more usable, the prompt template can be customized—swapping the “human” and “AI” labels and even changing the persona. The example replaces the generic assistant with Dwight from The Office, instructing the model to respond in Dwight’s voice and goals. As the conversation grows, the accumulated messages can be exported to JSON via a messages-to-dictionary conversion and saved to disk, then reloaded later by reconstructing chat message history from the saved structure.
Context limits drive the next memory strategies. Conversation buffer window memory keeps only the last K messages (set to k=1 in the example), so older details drop out and the model becomes less specific—illustrated when a question about “five frames of paper” yields a generic answer because earlier details are no longer in the prompt. Conversation summary buffer memory takes the opposite approach: it compresses the full conversation into a running summary once a token threshold is reached. In the example, the summary retains the important fact that the sales email quoted five frames of paper, enabling later extraction of that detail—something the windowed approach fails to preserve.
Finally, vector store memory uses Chroma DB plus OpenAI embeddings to store conversation turns as searchable vectors. A retriever memory then selects the most relevant past snippets for a new query (top result with k=1), and a custom prompt injects only those relevant pieces. When asked how the paper stands out, the chatbot retrieves the earlier “best in the business / strong durable” content and rephrases it into a polished answer.
The session concludes by combining the learned pieces into a simple interactive chatbot loop in a Google Colab notebook, using conversation buffer memory and a Dwight persona prompt. The result is a working template for building chatbots that remember—whether by storing everything, summarizing, or retrieving semantically relevant facts.
Cornell Notes
LangChain memory lets a chatbot carry context across turns by injecting prior conversation into each new prompt. Conversation buffer memory stores the full chat history in a “history” field, enabling follow-up questions to reference earlier answers. Conversation buffer window memory limits context to the last K messages, which can make the model generic when key details fall out. Conversation summary buffer memory compresses long chats into a running summary, preserving important facts like the quoted quantity. Vector store memory (Chroma DB + OpenAI embeddings) retrieves the most relevant past snippets via semantic search, so the model can answer targeted questions using only what matters.
How does conversation buffer memory make a chatbot “remember” earlier turns?
What changes when using conversation buffer window memory with k=1?
How does conversation summary buffer memory preserve key facts under context limits?
Why use vector store memory, and how does it retrieve relevant context?
What does persona/prompt customization change in a memory-based chatbot?
How can conversation memory be saved and reloaded?
Review Questions
- When would window memory be a poor fit compared to summary memory? Give a concrete example from the described behavior.
- How does vector store retrieval differ from buffer-based injection in terms of what gets included in the prompt?
- What information must be preserved in a summary so that later questions can be answered accurately?
Key Points
- 1
LangChain memory works by injecting stored conversation context into the prompt for each new model call.
- 2
Conversation buffer memory preserves full chat history, enabling accurate follow-ups but risking context-limit issues.
- 3
Conversation buffer window memory keeps only the last K messages, which can cause the model to lose earlier specifics.
- 4
Conversation summary buffer memory compresses long chats into a running summary, retaining key facts for later extraction.
- 5
Vector store memory uses Chroma DB and OpenAI embeddings to retrieve semantically relevant past turns instead of replaying the entire transcript.
- 6
Prompt templates can be customized to change persona and labeling while still using the same memory mechanisms.
- 7
Conversation messages can be exported to JSON and reloaded to persist chat state across runs.