LLM JSON Output - Get Valid JSON with Pydantic and LangChain Output Parsers

TL;DR

Use native JSON response formatting when the API/model supports it; it reduces cleanup and parsing errors.

Briefing Cornell Notes

Briefing

Getting reliable JSON from large language models—especially ones that don’t natively support structured outputs—requires more than “please output JSON.” The core approach here is to pair a strict schema (via Pydantic) with output-parsing logic (via LangChain-style parsers), and to enforce “pure JSON only” formatting so downstream code can safely consume the result.

The walkthrough starts with Gro’s API, which can request a JSON object directly. Using Gro’s client, it sets an API key and selects a model (defaulting to “Lama 370b”). A custom predict function builds a messages array (optionally prepending a system prompt), calls the completions endpoint, and—when JSON output is requested—passes a parameter that tells the API to return a JSON object rather than free-form text. For models that support this mode, the workflow is straightforward: include a system prompt that demands JSON, provide a sample JSON shape, and set the response format to JSON. The result is a response that can be printed as JSON and parsed without extra cleanup.

The more fragile case is when the model only returns text. For that, the method shifts to a two-step pipeline: (1) generate text that contains JSON in a predictable wrapper, then (2) extract and validate it against a Pydantic schema. The example defines a Pydantic BaseModel with two required fields—readability and conciseness—both scored from 0 to 10. It then leverages LangChain/Ragas-style schema prompting patterns: the prompt includes the schema, instructs the model to return only a pure JSON string (no preamble or explanation), and often specifies that the JSON should be surrounded by triple backticks. After receiving the model output, the code strips the backticks, parses the remaining string into a Python dictionary, and uses Pydantic’s parsing/validation to ensure the fields and types match.

A key practical detail is the prompt engineering itself. The “text-only” prompt is long and prescriptive: it repeats the evaluation task (scoring tweet writing style for readability and conciseness), includes both a correctly formatted example and a negative example where the JSON object properties are not well formatted, and ends with explicit instructions to output JSON only. In the example, running this prompt against “Lama 3” yields a response that can be cleaned (removing the backticks) and parsed into the Pydantic model.

Finally, the workflow can be simplified further by using LangChain’s Pydantic output parser directly. When integrated into LangChain chains, that parser can also trigger a repair loop: if parsing fails due to invalid JSON, LangChain can call the model again with instructions to fix the output. The end result is a reusable pattern: use native JSON support when available; otherwise, enforce schema-driven prompting plus strict parsing/validation so applications can reliably consume structured LLM outputs.

Cornell Notes

The transcript presents a practical method for getting valid JSON from LLMs, even when they don’t support structured outputs natively. When Gro’s API supports JSON mode, a system prompt plus a JSON response format yields directly parseable JSON. For text-only models, the method switches to schema-driven prompting using Pydantic: define a model (e.g., readability and conciseness, both required), instruct the L to output pure JSON (often wrapped in triple backticks), then strip wrappers and parse/validate with Pydantic. LangChain’s Pydantic output parser can further improve reliability by re-asking the L to repair invalid JSON when parsing fails.

Why does JSON-only prompting often fail with smaller or older models, and what workaround is used here?

Smaller/older models may not reliably follow “output JSON” instructions, leading to malformed structures or extra text. The workaround is to (1) use a more capable model when possible (the example defaults to “Lama 370b”), and (2) when JSON mode isn’t available, generate text that contains JSON in a predictable format and then extract/validate it against a Pydantic schema. This turns a best-effort formatting request into a schema-checked parsing step.

How does the Gro-based approach produce JSON without manual extraction?

Gro’s client supports a parameter that requests a JSON object response format. Combined with a system prompt that includes a sample JSON shape, the completions call returns a response that is already structured as JSON. The code then returns the first message content directly, avoiding the need to strip backticks or repair malformed output.

What role does Pydantic play in the text-only fallback strategy?

Pydantic defines the expected output structure and types. The example creates a BaseModel with two required fields—readability and conciseness—both scored from 0 to 10. After the L returns text containing JSON, the code strips triple backticks and uses Pydantic parsing to validate that the JSON matches the schema, producing a typed object (or raising errors if it doesn’t).

Why include examples (including a negative example) inside the prompt?

The prompt is designed to be highly specific about formatting. It includes a correct JSON example and a negative example where object properties are not well formatted. That specificity helps the model learn the exact structure expected, improving the odds that the returned JSON will pass Pydantic validation.

How does LangChain’s Pydantic output parser improve reliability beyond basic parsing?

LangChain can automatically attempt to fix invalid JSON. If parsing fails (e.g., the output isn’t valid JSON or doesn’t match the schema), LangChain can call the L again with instructions to repair the output. This creates a feedback loop that increases the chance of obtaining valid, schema-compliant JSON.

Review Questions

What changes when moving from a model/API that supports JSON response format to one that only returns text?
How do Pydantic schema requirements (field names and required-ness) influence the parsing and validation step?
What specific prompt constraints (e.g., “pure JSON only,” backticks, examples) are used to reduce malformed outputs?

Key Points

1
Use native JSON response formatting when the API/model supports it; it reduces cleanup and parsing errors.
2
For text-only models, generate JSON inside a predictable wrapper (often triple backticks) and then strip the wrapper before parsing.
3
Define a strict Pydantic schema for the expected fields (e.g., readability and conciseness) so invalid outputs fail validation instead of silently propagating.
4
Use schema-driven prompting (include the expected structure and required fields) to guide the model toward correct JSON formatting.
5
Add explicit “JSON only” instructions and include both correct and incorrect formatting examples to improve adherence.
6
When using LangChain, rely on its Pydantic output parser and repair behavior to re-ask for corrected JSON if parsing fails.

Highlights

Gro’s JSON response format can return directly parseable JSON when supported, eliminating manual extraction steps.

The text-only fallback relies on a strict Pydantic schema plus prompt instructions that demand pure JSON (often wrapped in triple backticks).

Including a negative formatting example inside the prompt helps steer the model away from malformed JSON that would fail validation.

LangChain’s Pydantic output parser can trigger a repair pass when JSON parsing fails, improving end-to-end reliability.

Topics

JSON Output
Pydantic Schemas
LangChain Output Parsers
Prompt Engineering
Structured LLM Responses