Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations

TL;DR

LangChain memory works by injecting stored conversation context into the prompt for each new model call.

Briefing Cornell Notes

Briefing

LangChain memory turns a basic chatbot into a conversation that can remember what was said earlier—then choose how much to keep, how to compress it, and how to retrieve the most relevant details later. The practical payoff is a chatbot that can sustain context across turns (buffer memory), stay within context limits by summarizing (summary memory), and pull back specific facts using semantic search (vector store memory). The walkthrough builds these memory types step by step and then combines them into a working “Dwight-style” chatbot that can generate sales talk while tracking prior exchanges.

The foundation starts with LangChain’s chat message history, which stores alternating human and AI messages as structured objects (human message and AI message). A conversation buffer memory then converts that history into a compact “history” field that gets injected into the prompt for each new request. Using a conversation chain with a ChatGPT model (temperature set to 0) and an initially empty memory, the chatbot responds to a first question, then automatically appends the new user input and model output to memory. A follow-up question demonstrates the key behavior: the model can reference earlier context because the previous response is included in the next prompt.

To make memory more usable, the prompt template can be customized—swapping the “human” and “AI” labels and even changing the persona. The example replaces the generic assistant with Dwight from The Office, instructing the model to respond in Dwight’s voice and goals. As the conversation grows, the accumulated messages can be exported to JSON via a messages-to-dictionary conversion and saved to disk, then reloaded later by reconstructing chat message history from the saved structure.

Context limits drive the next memory strategies. Conversation buffer window memory keeps only the last K messages (set to k=1 in the example), so older details drop out and the model becomes less specific—illustrated when a question about “five frames of paper” yields a generic answer because earlier details are no longer in the prompt. Conversation summary buffer memory takes the opposite approach: it compresses the full conversation into a running summary once a token threshold is reached. In the example, the summary retains the important fact that the sales email quoted five frames of paper, enabling later extraction of that detail—something the windowed approach fails to preserve.

Finally, vector store memory uses Chroma DB plus OpenAI embeddings to store conversation turns as searchable vectors. A retriever memory then selects the most relevant past snippets for a new query (top result with k=1), and a custom prompt injects only those relevant pieces. When asked how the paper stands out, the chatbot retrieves the earlier “best in the business / strong durable” content and rephrases it into a polished answer.

The session concludes by combining the learned pieces into a simple interactive chatbot loop in a Google Colab notebook, using conversation buffer memory and a Dwight persona prompt. The result is a working template for building chatbots that remember—whether by storing everything, summarizing, or retrieving semantically relevant facts.

Cornell Notes

LangChain memory lets a chatbot carry context across turns by injecting prior conversation into each new prompt. Conversation buffer memory stores the full chat history in a “history” field, enabling follow-up questions to reference earlier answers. Conversation buffer window memory limits context to the last K messages, which can make the model generic when key details fall out. Conversation summary buffer memory compresses long chats into a running summary, preserving important facts like the quoted quantity. Vector store memory (Chroma DB + OpenAI embeddings) retrieves the most relevant past snippets via semantic search, so the model can answer targeted questions using only what matters.

How does conversation buffer memory make a chatbot “remember” earlier turns?

It maintains a chat message history of alternating human and AI messages, then converts that history into a single prompt variable (commonly a key like “history”). Each time the conversation chain runs, the prompt template includes the injected history plus the new user input, so the model sees prior exchanges and can reference them in its next response.

What changes when using conversation buffer window memory with k=1?

Only the last K messages are kept for the next prompt. With k=1, older user questions and AI answers are dropped from the context window, so the model can’t rely on earlier details. In the example, after switching to window memory, a question about the paper quantity produces a more generic response because the earlier “five frames of paper” detail is no longer present.

How does conversation summary buffer memory preserve key facts under context limits?

When the conversation grows, it replaces the full history with a condensed summary generated by a language model. That summary is then injected into future prompts. The example shows the model later answering that the sales email quoted five frames of paper—because the summary retained that specific detail even after the full transcript was compressed.

Why use vector store memory, and how does it retrieve relevant context?

Vector store memory stores conversation turns as embeddings in a vector database (Chroma DB). For a new query, a retriever searches the embedding space and returns the most semantically similar past snippets (k=1 in the example). A custom prompt then injects only those relevant excerpts, letting the model answer targeted questions using the best-matching prior content rather than the entire history.

What does persona/prompt customization change in a memory-based chatbot?

It changes how the injected history is framed and how the model responds. The example swaps labels and instructs the assistant to speak as Dwight from The Office, with goals and methods aligned to that persona. Memory still supplies prior turns, but the prompt template steers the style and objectives of the responses.

How can conversation memory be saved and reloaded?

Messages can be converted into a dictionary/JSON structure (including message content and type such as human or AI), written to a messages.json file, then loaded back by reconstructing chat message history from the saved dictionary. This enables persistence across notebook sessions or later experiments.

Review Questions

When would window memory be a poor fit compared to summary memory? Give a concrete example from the described behavior.
How does vector store retrieval differ from buffer-based injection in terms of what gets included in the prompt?
What information must be preserved in a summary so that later questions can be answered accurately?

Key Points

1
LangChain memory works by injecting stored conversation context into the prompt for each new model call.
2
Conversation buffer memory preserves full chat history, enabling accurate follow-ups but risking context-limit issues.
3
Conversation buffer window memory keeps only the last K messages, which can cause the model to lose earlier specifics.
4
Conversation summary buffer memory compresses long chats into a running summary, retaining key facts for later extraction.
5
Vector store memory uses Chroma DB and OpenAI embeddings to retrieve semantically relevant past turns instead of replaying the entire transcript.
6
Prompt templates can be customized to change persona and labeling while still using the same memory mechanisms.
7
Conversation messages can be exported to JSON and reloaded to persist chat state across runs.

Highlights

Conversation buffer memory injects a “history” field so the model can reference earlier answers in subsequent turns.

With k=1 window memory, earlier details like the quoted paper quantity fall out of context, leading to more generic responses.

Summary buffer memory retains important facts (e.g., “five frames of paper”) even after compressing the conversation.

Vector store memory retrieves the most relevant snippets via embeddings, letting targeted questions pull the right prior content.

A Dwight-style chatbot loop in a Colab notebook demonstrates how memory + persona prompts produce coherent multi-turn sales dialogue.

Topics

LangChain Memory
Conversation Buffer Memory
Summary Buffer Memory
Vector Store Memory
Chatbot Persona Prompts

Mentioned

Venelin Valkov
DB
k
JSON