Long Term Memory in LangGraph
Based on CampusX's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Long-term memory personalizes chatbot responses by extracting stable user facts from messages, storing them persistently, and retrieving relevant memories before answering.
Briefing
Long-term memory is the missing ingredient for chatbots that feel personal over time: instead of treating every conversation as brand-new, the system extracts user-specific facts from messages, stores them in a persistent memory backend, and consults that memory before generating each reply. In this LangGraph walkthrough, the core idea is straightforward but production-relevant: keep a per-user “memory store” that survives across sessions, then inject the most relevant stored details into the LLM’s prompt so responses match the user’s preferences, projects, and ongoing context.
The walkthrough starts by recapping why long-term memory matters in multi-thread chat experiences. Users rarely share everything in one place; they reveal different pieces of identity and preferences across separate conversations—like programming language choices in one thread, travel plans in another, and professional or philosophical interests elsewhere. A long-term memory store solves the fragmentation problem by persistently saving extracted facts and retrieving them later. The LLM then checks the memory store before answering, using stored preferences to tailor outputs—for example, generating code in Python after learning that the user prefers Python.
From there, the implementation shifts into LangGraph’s memory-store architecture. A key abstraction is an abstract “Base Store” class that defines core operations: create new memories (put), fetch a specific memory (get), search across memories (search), update (edit), and delete. Concrete implementations inherit from this base: an in-memory store for quick prototyping, a Postgres-backed store for production persistence, and a Redis-backed store as another production option. The practical lesson is that long-term memory in LangGraph is organized around a namespacing scheme—effectively folder-like partitions—so each user’s memories live under a user-specific namespace (e.g., “Users, U1”), and can be further subdivided (profiles, preferences, projects).
The tutorial then demonstrates two retrieval modes. Exact retrieval uses get when the key is known. Broad retrieval uses search without a semantic query, but that can overwhelm the LLM if many memories exist. To address relevance, semantic search is introduced: the memory store is configured with an embedding model (using OpenAI’s “text-embedding-3-small”), and search accepts a natural-language query plus a limit, returning the closest matching memories by embedding similarity. This enables the chatbot to pull only the facts most relevant to the current user message.
Next comes the LangGraph integration. A “chat node” builds a system prompt that includes retrieved memories (merged into a single text block) and instructs the assistant to personalize responses using those details. A separate “remember” workflow adds the ability to create new memories during conversation: an “extractor LLM” produces structured output (via a Pydantic model) indicating whether the latest user message contains stable, user-specific information worth storing, and returns a list of memory strings to write. A major pitfall appears immediately: without deduplication, repeated messages create redundant memory entries. The fix is a deduplication strategy where the extractor LLM compares extracted candidate memories against existing ones and tags each as new or already present; only “new” items get stored.
Finally, the tutorial highlights why persistence matters. The RAM-based store loses everything after restart, so production-grade systems should use Postgres or Redis. The walkthrough shows how to run Postgres via Docker, wire the graph to a Postgres store, and verify persistence by restarting and confirming the memories remain available. The end result is a merged LangGraph workflow where the system both remembers (extracts and stores new stable facts) and chats (retrieves relevant memories and personalizes responses), with persistence handled by Postgres for real-world reliability.
Cornell Notes
Long-term memory in LangGraph is implemented by storing user-specific facts in a persistent memory backend, then retrieving relevant memories before generating each reply. A Base Store abstraction provides put/get/search operations, with concrete implementations like an in-memory store for prototyping and Postgres/Redis stores for production persistence. Memories are organized using namespaces (e.g., per-user folders such as Users, U1), and retrieval can be exact (get) or semantic (search with embeddings) to avoid dumping irrelevant facts into the LLM context. A “remember” workflow uses an extractor LLM with structured output (Pydantic models) to decide what to store, while deduplication prevents redundant entries. The combined workflow personalizes chat responses and keeps learned facts across restarts using Postgres.
Why can’t a chatbot rely on a single conversation thread to learn a user’s preferences?
What does the Base Store abstraction enable in LangGraph memory systems?
How do namespaces improve memory organization and retrieval?
Why is semantic search preferred over naive “fetch everything” retrieval?
What is the purpose of the extractor LLM and structured output in the “remember” workflow?
How does deduplication prevent redundant long-term memories?
Review Questions
- How do namespaces and keys work together to ensure a chatbot retrieves the right user’s memories?
- When would semantic search return better results than get or search-without-query, and why?
- What failure mode appears when the remember workflow stores memories repeatedly, and what deduplication mechanism prevents it?
Key Points
- 1
Long-term memory personalizes chatbot responses by extracting stable user facts from messages, storing them persistently, and retrieving relevant memories before answering.
- 2
LangGraph memory stores follow a Base Store abstraction with put/get/search operations, enabling consistent integration across different backends.
- 3
Namespaces partition memories by user (and optionally by category like Profile or Preferences), keeping retrieval scoped and organized.
- 4
Semantic search uses embeddings (e.g., OpenAI “text-embedding-3-small”) to fetch only meaningfully relevant memories, avoiding context overload.
- 5
A remember workflow uses an extractor LLM with structured output to decide what to store and to extract memory candidates reliably.
- 6
Deduplication is essential; repeated messages can otherwise create redundant memory entries, so candidates must be compared against existing memories before writing.
- 7
RAM-based memory stores are volatile; Postgres (or Redis) is required for persistence across restarts in production-grade systems.