Tagging and Extraction - Classification using OpenAI Functions
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use OpenAI function-style schemas in LangChain to obtain structured JSON outputs without triggering external functions.
Briefing
OpenAI “functions” can be used in LangChain not to trigger external code, but to force large language models to return structured JSON outputs—turning messy text into reliable fields. LangChain’s built-in split into two capabilities—tagging (classification) and extraction (entity/field extraction)—lets developers define a schema up front and then constrain what the model is allowed to output.
For tagging, the workflow starts with a schema describing the labels to predict. A tagging chain is created with an LLM (the transcript references “GPT 3.5 turbo model from the 13th Of June”) and a temperature set to zero. The schema in the example asks for three outputs from each review: sentiment (positive/negative), stars (a rating), and language (English/Spanish/French/German). Under the hood, the model receives both a human prompt template and a functions role payload that includes the schema. A JSON output functions parser then converts the model’s response into structured data automatically.
Initial tests on Amazon reviews from the book “Spare” show what happens when the schema is too loose. One five-star review returns positive sentiment and a stars value of four, but leaves the language field blank. A negative review returns sentiment but omits stars and language. A mixed review returns nothing at all. The takeaway is practical: when the model isn’t sufficiently constrained—especially for required fields—outputs can be incomplete or fail.
Tightening the schema fixes much of the problem. The sentiment field becomes an enum with explicit allowed values (positive, neutral, negative). Stars are constrained to a numeric range (one to five). Language is no longer a free-form guess; it’s an enum limited to specific options (Spanish, English, French, German), and the fields are marked as required. With these constraints, the chain returns complete structured results for the same reviews, including sentiment, stars, and language—now as a Python dictionary. The transcript also demonstrates a Pydantic-based variant (“create tagging chain Pydantic”), where the output conforms to a typed Pydantic class, making it easier to access fields like response.sentiment and response.stars and to pass the object through downstream code.
Extraction works similarly but targets pulling specific fields from longer text, akin to named entity recognition. Using a TechCrunch article about controversy at Reddit, an extraction chain is built with a schema for fields such as person name, startup, news outlet, app name, and month. Only person name and startup are required in the initial setup. The functions role is used to request “information extraction,” and the model can call the function multiple times across the passage, producing a list of JSON objects.
The example highlights common extraction pitfalls: the model may confuse entity types (e.g., treating “Reddit” as a human), misclassify startups vs apps, and miss months. A key mitigation is context management: splitting the article into smaller chunks (e.g., two paragraphs at a time) improves coverage and reduces missed entities. The transcript also notes that adding descriptions to schema fields and using in-context learning examples can further improve accuracy.
Overall, the approach turns unstructured text into structured data using constrained function-style outputs, enabling downstream uses like sentiment analysis, routing different complaint types to different chains, and building knowledge graphs from extracted entities.
Cornell Notes
LangChain’s “tagging” and “extraction” chains use OpenAI function-style schemas to force structured JSON outputs from unstructured text. Tagging behaves like multi-class classification: define allowed labels (enums), mark required fields, and the model returns sentiment, star ratings, and language in a consistent structure. Loose schemas lead to missing fields or empty outputs; tightening constraints (enums, required fields, numeric ranges) improves reliability. Extraction pulls named fields from articles using a similar functions role, often returning multiple JSON objects per passage. Accuracy improves with better schema descriptions and by limiting context size (e.g., processing two paragraphs at a time).
Why did the initial tagging results come back incomplete or empty, even though the schema listed sentiment, stars, and language?
How do enums and required fields change tagging reliability?
What’s the difference between returning a dictionary and returning a Pydantic object in tagging?
How does extraction resemble NER, and what does it return?
What concrete strategy improved extraction results in the example article?
Review Questions
- When tagging, what specific schema changes (enums, required fields, numeric ranges) most directly prevent missing outputs?
- In extraction, why might the model label “Reddit” as a person, and how could schema descriptions or context chunking reduce such errors?
- How would you decide whether to use a plain dictionary output or a Pydantic-typed output for downstream processing?
Key Points
- 1
Use OpenAI function-style schemas in LangChain to obtain structured JSON outputs without triggering external functions.
- 2
Tagging is classification: define the fields to predict and constrain allowed values using enums and required fields.
- 3
Loose schemas often produce partial or empty outputs; tightening constraints improves completeness and consistency.
- 4
A Pydantic tagging chain returns a typed object, making field access and downstream validation easier than raw dictionaries.
- 5
Extraction pulls multiple structured entities from text by repeatedly calling an extraction function across the passage.
- 6
Extraction quality improves with better schema descriptions and by limiting context size (e.g., two paragraphs at a time).
- 7
Structured outputs can feed downstream workflows like sentiment analysis, complaint-type routing, and knowledge graph construction.