Get AI summaries of any video or article — Sign up free
1-LangGraph Tutorial-Getting Started With Pydantic-Data Validations thumbnail

1-LangGraph Tutorial-Getting Started With Pydantic-Data Validations

Krish Naik·
5 min read

Based on Krish Naik's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Define Pydantic BaseModel classes with typed fields to enforce runtime schemas for API and LLM outputs.

Briefing

Pydantic is positioned as the backbone for reliable structured outputs in LangGraph workflows—especially when data originates from APIs or LLMs and must match a strict schema. The core idea is straightforward: define a model (a class) with typed fields, then let Pydantic validate and coerce incoming data so downstream steps don’t break when types or formats drift. That matters in LangGraph because each node in a multi-step automation expects specific outputs; if an LLM returns a title as an integer or omits a required field, validation should fail fast rather than silently corrupt later steps.

The tutorial starts by grounding the need for validation in a common API scenario. An application receives JSON containing fields like name (string) and age (integer). Without a schema, the application has no guarantee the payload matches the expected types. With Pydantic, developers define a class such as Person with fields typed as str and int; when an instance is created, Pydantic enforces those constraints at runtime. The speed claim is tied to implementation details: Pydantic’s validation engine is written in Rust, making the validation process faster than many pure-Python alternatives.

That same mechanism becomes crucial when moving from simple APIs to LLM-driven automation. The example workflow is an automated pipeline where uploading a YouTube video triggers a sequence: extract content, call an LLM, and generate a blog—without human intervention. In LangGraph terms, each node produces structured outputs. If the LLM is expected to return fields like title and description (both strings), Pydantic models can enforce that structure. The tutorial emphasizes that this is how structured output validation is achieved: the LLM is guided to output data that conforms to a Pydantic-defined schema, and Pydantic verifies it.

After establishing why Pydantic matters, the walkthrough shifts into hands-on setup and core modeling patterns. It demonstrates creating a simple BaseModel subclass (Person) and shows that incorrect types trigger a validation error (e.g., providing an integer where a string is required). It contrasts this with Python dataclasses, noting that dataclasses don’t perform the same runtime type validation. The tutorial then expands into practical features used in real schemas: optional fields (using Optional with a default of None), automatic type casting (e.g., converting numeric strings to integers when appropriate), list-typed fields (e.g., list[str]) with validation errors when elements don’t match the declared type, and nested models (building an Address model inside a Customer model).

The final portion introduces field-level customization using constraints via field(). It shows how to enforce minimum/maximum string lengths and numeric ranges (e.g., price must be > 0 and < 1000, quantity must be within bounds). It also covers adding metadata like descriptions and default values, then generating a schema for documentation and integration. The tutorial notes that schema generation via schema() is deprecated in favor of model_json_schema(). The session ends by previewing that later parts will connect these Pydantic patterns directly into LangGraph workflows.

Cornell Notes

Pydantic is presented as the runtime schema validator that makes structured LLM and API outputs dependable. By defining a BaseModel subclass with typed fields, Pydantic validates incoming data and raises errors when types don’t match, while also performing useful type casting. The tutorial demonstrates core patterns: required vs optional fields, list fields with element-level validation, nested models, and field() constraints for min/max lengths and numeric ranges. It also shows how to attach descriptions and defaults and generate an integration-ready JSON schema using model_json_schema(). This matters for LangGraph because each node in a multi-step workflow needs predictable, correctly typed outputs.

Why does Pydantic matter for LangGraph workflows that rely on LLM outputs?

LangGraph nodes typically expect structured outputs. When an LLM generates data (e.g., a blog title and description), Pydantic models can enforce that the returned fields match the declared types (title: str, description: str). If the LLM returns an integer for a string field or omits required fields, Pydantic raises a validation error immediately, preventing downstream nodes from operating on malformed data.

How does Pydantic validation differ from using Python dataclasses for the same “Person” model?

A dataclass can store fields, but it doesn’t automatically validate runtime types the way Pydantic does. In the tutorial, Person is defined with name: str and age: int. When an instance is created with name as an integer, Pydantic produces a validation error (“string type” expected). The dataclass version accepts values without the same runtime enforcement.

What does Optional[...] with a default of None do in Pydantic models?

Optional fields allow the value to be None (null). The tutorial defines an Employee model with salaries: Optional[float] = None and active: bool = True. When salaries isn’t provided, the model sets it to None. When salaries is provided as a number (or a castable value), Pydantic validates it as a float.

How are list fields validated in Pydantic?

For list fields, the element type is enforced. The tutorial defines students: list[str] and creates a Classroom model. Passing students as a list of strings works (e.g., ["Alice", "Bob", "Charlie"]). If students contains non-string elements (e.g., [1, 2, 3]), Pydantic raises a validation error indicating the input should be valid string type.

What is the benefit of nested Pydantic models?

Nested models let complex JSON structures be represented cleanly and validated recursively. The tutorial creates an Address model (street, city, zip_code) and a Customer model that includes address: Address. When Customer is instantiated with address as a dictionary, Pydantic validates the nested fields. It also demonstrates type casting, such as converting zip_code provided as a string into an integer when the field type is int.

How do field() constraints and schema generation help integration?

field() adds validation rules beyond basic type hints. The tutorial constrains strings with min_length and max_length, and numbers with comparisons like greater than and less than. It also attaches metadata like descriptions and default values. For integration and documentation, it generates a JSON schema using model_json_schema() (noting schema() is deprecated), so developers can see exactly what an API or LLM output should look like.

Review Questions

  1. In what situations would Pydantic raise a validation error when used with an LLM node output in LangGraph?
  2. How do Optional fields change validation behavior compared with required fields in Pydantic?
  3. Give one example of a field() constraint (string or numeric) and explain what inputs would fail it.

Key Points

  1. 1

    Define Pydantic BaseModel classes with typed fields to enforce runtime schemas for API and LLM outputs.

  2. 2

    Pydantic validation fails fast when types don’t match (e.g., string fields receiving integers), preventing corrupted downstream workflow steps.

  3. 3

    Optional fields declared with Optional[...] and default None allow missing values while still validating types when values are provided.

  4. 4

    List fields enforce element-level types (e.g., list[str] rejects lists containing non-strings).

  5. 5

    Nested models validate complex JSON structures recursively, including automatic type casting when compatible.

  6. 6

    Use field() constraints to add min/max lengths and numeric ranges on top of basic type hints.

  7. 7

    Generate integration-ready documentation with model_json_schema() so consumers know the expected structured output format.

Highlights

Pydantic turns “expected JSON structure” into enforceable runtime rules by validating typed BaseModel fields.
LangGraph node outputs become safer when each node’s structured output is validated against a Pydantic schema.
field() constraints let developers specify real-world rules like min_length/max_length and numeric bounds, not just data types.
Nested Pydantic models validate multi-level objects (like Customer → Address) and can cast compatible types automatically.

Topics

  • Pydantic Data Validation
  • LangGraph Structured Output
  • BaseModel Schemas
  • Optional Fields
  • Field Constraints
  • Nested Models

Mentioned