Long Term Memory in LangGraph

TL;DR

Long-term memory personalizes chatbot responses by extracting stable user facts from messages, storing them persistently, and retrieving relevant memories before answering.

Briefing Cornell Notes

Briefing

Long-term memory is the missing ingredient for chatbots that feel personal over time: instead of treating every conversation as brand-new, the system extracts user-specific facts from messages, stores them in a persistent memory backend, and consults that memory before generating each reply. In this LangGraph walkthrough, the core idea is straightforward but production-relevant: keep a per-user “memory store” that survives across sessions, then inject the most relevant stored details into the LLM’s prompt so responses match the user’s preferences, projects, and ongoing context.

The walkthrough starts by recapping why long-term memory matters in multi-thread chat experiences. Users rarely share everything in one place; they reveal different pieces of identity and preferences across separate conversations—like programming language choices in one thread, travel plans in another, and professional or philosophical interests elsewhere. A long-term memory store solves the fragmentation problem by persistently saving extracted facts and retrieving them later. The LLM then checks the memory store before answering, using stored preferences to tailor outputs—for example, generating code in Python after learning that the user prefers Python.

From there, the implementation shifts into LangGraph’s memory-store architecture. A key abstraction is an abstract “Base Store” class that defines core operations: create new memories (put), fetch a specific memory (get), search across memories (search), update (edit), and delete. Concrete implementations inherit from this base: an in-memory store for quick prototyping, a Postgres-backed store for production persistence, and a Redis-backed store as another production option. The practical lesson is that long-term memory in LangGraph is organized around a namespacing scheme—effectively folder-like partitions—so each user’s memories live under a user-specific namespace (e.g., “Users, U1”), and can be further subdivided (profiles, preferences, projects).

The tutorial then demonstrates two retrieval modes. Exact retrieval uses get when the key is known. Broad retrieval uses search without a semantic query, but that can overwhelm the LLM if many memories exist. To address relevance, semantic search is introduced: the memory store is configured with an embedding model (using OpenAI’s “text-embedding-3-small”), and search accepts a natural-language query plus a limit, returning the closest matching memories by embedding similarity. This enables the chatbot to pull only the facts most relevant to the current user message.

Next comes the LangGraph integration. A “chat node” builds a system prompt that includes retrieved memories (merged into a single text block) and instructs the assistant to personalize responses using those details. A separate “remember” workflow adds the ability to create new memories during conversation: an “extractor LLM” produces structured output (via a Pydantic model) indicating whether the latest user message contains stable, user-specific information worth storing, and returns a list of memory strings to write. A major pitfall appears immediately: without deduplication, repeated messages create redundant memory entries. The fix is a deduplication strategy where the extractor LLM compares extracted candidate memories against existing ones and tags each as new or already present; only “new” items get stored.

Finally, the tutorial highlights why persistence matters. The RAM-based store loses everything after restart, so production-grade systems should use Postgres or Redis. The walkthrough shows how to run Postgres via Docker, wire the graph to a Postgres store, and verify persistence by restarting and confirming the memories remain available. The end result is a merged LangGraph workflow where the system both remembers (extracts and stores new stable facts) and chats (retrieves relevant memories and personalizes responses), with persistence handled by Postgres for real-world reliability.

Cornell Notes

Long-term memory in LangGraph is implemented by storing user-specific facts in a persistent memory backend, then retrieving relevant memories before generating each reply. A Base Store abstraction provides put/get/search operations, with concrete implementations like an in-memory store for prototyping and Postgres/Redis stores for production persistence. Memories are organized using namespaces (e.g., per-user folders such as Users, U1), and retrieval can be exact (get) or semantic (search with embeddings) to avoid dumping irrelevant facts into the LLM context. A “remember” workflow uses an extractor LLM with structured output (Pydantic models) to decide what to store, while deduplication prevents redundant entries. The combined workflow personalizes chat responses and keeps learned facts across restarts using Postgres.

Why can’t a chatbot rely on a single conversation thread to learn a user’s preferences?

Users distribute important information across multiple threads over time. One conversation may reveal technical preferences (e.g., Python), another may contain travel plans (e.g., a future trip), and another may show professional identity (e.g., teaching on YouTube). Long-term memory addresses this by extracting those stable facts from each message and storing them in a persistent memory store so later replies can be personalized even when the user doesn’t restate everything.

What does the Base Store abstraction enable in LangGraph memory systems?

Base Store defines the core memory lifecycle operations: create new memories (put), fetch a specific memory (get), search across memories (search), and also supports update/edit and delete. Concrete stores inherit these capabilities, letting the rest of the system treat memory consistently while swapping storage backends (RAM for prototyping, Postgres for persistence, Redis as another production option).

How do namespaces improve memory organization and retrieval?

Namespaces act like folder partitions inside the memory store. Instead of mixing all users’ facts together, the system writes each user’s memories under a user-specific namespace (e.g., Users, U1). This keeps retrieval scoped to the current user and supports further sub-organization (like Users, U1, Profile or Users, U1, Preferences). The put/get/search calls always include the namespace so the correct subset of memories is targeted.

Why is semantic search preferred over naive “fetch everything” retrieval?

If a user has many stored memories, pulling all of them into the LLM context can confuse the model and reduce response quality. Semantic search uses embeddings to match the current query’s meaning against stored memory embeddings, returning only the closest relevant items. In the walkthrough, search is configured with OpenAI’s embedding model “text-embedding-3-small” and uses a query plus a limit to return the top matching memories.

What is the purpose of the extractor LLM and structured output in the “remember” workflow?

The extractor LLM decides whether the latest user message contains stable, user-specific information worth storing. Structured output (via a Pydantic model) forces the LLM to return a boolean like should_write (true/false) and a list of memory strings when appropriate. This prevents storing transient or irrelevant details and keeps memory writes controlled and predictable.

How does deduplication prevent redundant long-term memories?

Without deduplication, repeated messages can create duplicate memory entries. The fix is to have the extractor LLM compare newly extracted candidate memories against existing stored memories and label each candidate as new or already present (using a memory item model that includes a boolean). Only candidates marked as new are written to the store, eliminating duplicates across repeated runs.

Review Questions

How do namespaces and keys work together to ensure a chatbot retrieves the right user’s memories?
When would semantic search return better results than get or search-without-query, and why?
What failure mode appears when the remember workflow stores memories repeatedly, and what deduplication mechanism prevents it?

Key Points

1
Long-term memory personalizes chatbot responses by extracting stable user facts from messages, storing them persistently, and retrieving relevant memories before answering.
2
LangGraph memory stores follow a Base Store abstraction with put/get/search operations, enabling consistent integration across different backends.
3
Namespaces partition memories by user (and optionally by category like Profile or Preferences), keeping retrieval scoped and organized.
4
Semantic search uses embeddings (e.g., OpenAI “text-embedding-3-small”) to fetch only meaningfully relevant memories, avoiding context overload.
5
A remember workflow uses an extractor LLM with structured output to decide what to store and to extract memory candidates reliably.
6
Deduplication is essential; repeated messages can otherwise create redundant memory entries, so candidates must be compared against existing memories before writing.
7
RAM-based memory stores are volatile; Postgres (or Redis) is required for persistence across restarts in production-grade systems.

Highlights

Long-term memory works by injecting retrieved user-specific facts into the LLM’s prompt so replies reflect preferences learned across separate conversations.

Semantic search prevents the “dump everything” problem by returning top matching memories using embedding similarity and a limit parameter.

The remember workflow can store new facts in real time, but it must include deduplication to avoid redundant memory growth.

RAM-backed memory loses data after restart; Postgres-backed storage preserves memories across sessions and restarts. 

Topics

Long-Term Memory
LangGraph Memory Stores
Semantic Search
Deduplication
Postgres Persistence

Mentioned

Nitesh