Observability in LangGraph | LangSmith Integration with LangGraph
Based on CampusX's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
LangSmith observability for LangGraph captures end-to-end traces per chatbot turn, including LangGraph node execution, LLM input/output, token usage, and latency breakdowns.
Briefing
Observability for LangGraph agents becomes practical once every user turn is captured as an end-to-end trace in LangSmith—complete with timing, token usage, and the exact LangGraph node and LLM inputs/outputs involved. The payoff is straightforward: when something goes wrong (or just behaves oddly), developers can inspect what the user asked, what the model returned, how long each step took, and how many tokens were consumed—without guessing.
The walkthrough starts by positioning observability as a logging and tracing layer for LLM systems. A user chats with a LangGraph-based chatbot through a GUI, with streaming and database persistence already in place from earlier steps. The new addition is LangSmith integration, which records the full execution path whenever the chatbot runs: the conversation messages, the LLM response, token counts (both input and output), latency metrics (including first-token timing and overall generation time), and internal execution details such as which LangGraph node ran and which model handled the request (example shown with a “ChatOpenAI” model). After wiring the integration, the code itself doesn’t need manual changes in the main execution path—LangSmith captures traces automatically behind the scenes once environment variables are set.
Setup requires creating a LangSmith account, generating an API key, and adding a small set of environment variables (including a LangSmith endpoint, the API key, a boolean to enable tracing, and a project name). That project name becomes the organizing container in LangSmith’s dashboard. Running the existing chatbot code again produces a new “chatbot project” entry, where each user message/response pair appears as a trace. Clicking a trace reveals the execution timeline and structured details: start/end times, status, token usage, latency breakdown, and the LLM’s input/output payloads.
A key nuance emerges when multiple conversations are involved. Without extra configuration, separate threads (like starting a new chat about “biryani” after a prior topic) can still end up with turns stored in a way that feels mismanaged—turns from different conversations may appear under the same broader storage context. The fix is to introduce thread-level organization by passing a thread identifier (or session/conversation identifier) during invocation. LangSmith supports this by letting traces be grouped into threads, but it requires a small addition to the chatbot backend: explicitly include a thread_id (or session_id/conversation_id) in the configuration passed to LangGraph/LangSmith.
The final part demonstrates the improved structure. With thread IDs added via metadata, LangSmith shows distinct threads for distinct conversations. Each thread contains the correct sequence of traces (one per turn), and each trace can be inspected individually for the exact user message, the AI reply, token usage, and latency. This thread-aware observability is framed as essential for scaling beyond toy examples—especially when adding more complex capabilities like tools, racks, and MCP-style integrations—because it turns production debugging and performance analysis into a navigable dashboard rather than a log scavenger hunt.
Cornell Notes
LangSmith integration adds observability to LangGraph agents by capturing end-to-end traces for each chatbot turn. Once environment variables are set (including a LangSmith API key, endpoint, tracing enabled flag, and a project name), traces appear automatically in the LangSmith dashboard without changing the main code path. Each trace records the LangGraph node executed, the LLM model used (e.g., ChatOpenAI), the LLM input/output, token counts (input and output), and latency metrics (including first-token timing). To keep multiple conversations from mixing, traces should be grouped into separate LangSmith threads by passing a thread_id (or session_id/conversation_id) during invocation via configuration/metadata. With thread IDs, each conversation becomes a clean thread containing its ordered turn-by-turn traces, making debugging and performance review far easier.
What does “observability” mean in this LangGraph + LangSmith setup, and what gets recorded for each turn?
How does the integration get turned on, and why doesn’t the main code need major changes?
Where do traces show up in LangSmith, and how are they organized at first?
Why can conversation “threads” feel mismanaged without thread IDs, and what’s the fix?
What code-level change is required to enable thread organization, and how does it affect the dashboard?
Review Questions
- What specific metrics and payload details does LangSmith display inside a single trace for a chatbot turn?
- How does passing a thread_id (or session_id/conversation_id) change how traces are grouped in LangSmith?
- Why is thread-aware observability important when scaling from single-turn tests to multi-conversation production usage?
Key Points
- 1
LangSmith observability for LangGraph captures end-to-end traces per chatbot turn, including LangGraph node execution, LLM input/output, token usage, and latency breakdowns.
- 2
Environment-variable setup (API key, endpoint, tracing enabled flag, and project name) enables automatic tracing without major changes to the main chatbot code path.
- 3
Each user message/AI response pair becomes a separate trace inside the configured LangSmith project, making turn-level inspection possible.
- 4
Without thread IDs, separate conversations can appear confusingly stored; adding thread grouping resolves this by separating conversations into distinct threads.
- 5
Passing a thread_id (or session_id/conversation_id) via invocation configuration/metadata is the key code change to organize traces correctly.
- 6
Adding metadata like a clearer run name improves dashboard readability by making traces easier to interpret as “chat turns.”