100% Local AI Agents with DeepSeek-R1, Ollama, Pydantic and LangGraph

TL;DR

The pipeline fetches Reddit posts and recursively parses comment replies from Reddit JSON endpoints, then normalizes everything into Pydantic models for downstream processing.

Briefing Cornell Notes

Briefing

A fully local “agentic” workflow can fetch Reddit posts, let a person steer which threads matter, then run semantic filtering and structured analysis to produce a grounded executive report—complete with references—using models that run on a user’s own machine. The practical payoff is privacy and control: the pipeline pulls public Reddit data, processes it locally with DeepSeek-R1 (via local inference), and only pauses for human input at the point where relevance is subjective.

The workflow starts with a user-provided subreddit name and a maximum number of posts to fetch. A first agent pulls the posts and their associated comments using Reddit’s JSON endpoints (including comment replies), applying a minimum score threshold and configurable depth so the comment tree doesn’t explode. Each post is normalized into a structured data model (Pydantic-based), capturing fields like title, text, score, author, category (optional), and a recursive list of comments.

Next comes the human-in-the-loop step. The graph execution intentionally interrupts after fetching so the user can choose which posts to focus on. That selection is then used as the input to a semantic search tool: the system converts posts (title, category, text, and comment text) into embeddings and stores them in an in-memory vector store. A custom LangChain tool (“search documents”) takes the user’s query, cleans it, retrieves the most relevant post-permalinks, and returns only those matches back into the workflow.

For each selected post, a second agent performs structured analysis with a model configured for JSON-schema output. The analysis prompt asks for a summary, key points, main topics, controversies, takeaways, and sentiment—explicitly enumerating sentiment options—and it grounds the output in both the post text and the associated comments. This structured output is then fed into a final report writer agent.

The report writer uses DeepSeek-R1 to generate an executive summary that answers the user’s original question, while also listing titles, summaries, takeaways, and references to the specific Reddit posts used. The workflow is built with LangGraph’s functional API and uses a checkpointer (memory saver) to persist state so the run can pause at the interrupt and later resume once the user provides a filter query.

A key engineering choice is mixing models by task. Smaller, faster models handle tool use and analysis (e.g., Qwen 2.5 for semantic/tool-related steps), while the larger DeepSeek-R1 model is reserved for the final report. The transcript also notes that DeepSeek-R1 performs better for tool calling than Qwen 2.5 in some cases, but the overall design keeps costs and latency manageable.

Finally, the pipeline is wrapped in a Streamlit UI. Users enter a subreddit and post limit, download/fetch posts, then interact through a chat-style input to steer filtering and trigger report generation. A demo on a public subreddit compares outputs across model choices (e.g., “Gro 3 versus OpenAI o3” framing in the example), showing how the system produces a report with references in roughly a minute on modest hardware. The result is a private, reproducible agentic workflow that can be extended—such as saving fetched posts to a local or remote database for longer-term analysis.

Cornell Notes

The workflow builds a local agentic pipeline that turns Reddit data into a grounded executive report. It fetches posts and recursively parses comments from Reddit’s JSON endpoints, then pauses for human input to choose which threads to prioritize. After that, a semantic search tool embeds post text and comment text into an in-memory vector store, retrieves the most relevant permalinks for a user query, and passes only those selections forward. A structured-analysis agent produces JSON outputs (summary, key points, topics, controversies, takeaways, sentiment) for each selected post. A final report writer agent (DeepSeek-R1) compiles an executive summary that answers the user’s question and includes references to the analyzed posts.

How does the system fetch Reddit content while keeping the workflow local and structured?

It uses Reddit JSON endpoints by appending “.json” to subreddit and post URLs, retrieving post metadata plus comment threads. The code parses the JSON with a recursive extractor that walks lists/dicts to collect comments and replies, applying a minimum score threshold (e.g., score >= 2) and configurable maximum depth to limit recursion. Each post is normalized into Pydantic models that store fields like title, text, author, score, and a recursive list of comments (including replies).

Where does the “human in the loop” happen, and why is it placed there?

The LangGraph workflow triggers an interrupt after fetching posts and before semantic filtering/analysis. That interrupt returns the candidate posts to the UI so the user can decide which threads to focus on. This placement matters because relevance is subjective; the system avoids wasting compute on semantic search and analysis for posts the user would never care about.

How does semantic filtering work once the user provides a query?

A custom LangChain tool (“search documents”) embeds post content (title/category/text plus comment text) into vectors using an embedding model configured in the project. The tool stores embeddings in an in-memory vector store and retrieves the top matches for the cleaned user query. The tool returns permalinks/IDs, and the workflow filters the original post objects to keep only the selected matches (so downstream agents see the full post+comments, not just retrieval snippets).

What does “structured output” mean in this pipeline, and what fields are produced?

The post-analysis agent calls a model with a JSON-schema requirement (via Pydantic base models) so outputs are machine-readable. For each selected post, it generates: summary, key points, topics, controversies, sentiment, and takeaways. Sentiment is constrained to enumerated options, and the prompt explicitly instructs the model to ground results in both the post text and the associated comments.

Why mix different models across steps instead of using one model everywhere?

The design uses smaller/faster models for tasks like tool calling, semantic filtering, and intermediate analysis, then reserves a larger model for the final report where coherence and synthesis matter most. The transcript specifically mentions using Qwen 2.5 for certain steps and DeepSeek-R1 for report writing, with additional notes that model performance can vary by task (including tool-calling behavior).

How does the UI coordinate with the paused/resumed workflow?

Streamlit collects the subreddit and post limit, then runs the workflow until the interrupt. The UI displays the candidate posts and provides a chat-like input for the user’s filter query. When the user submits, the app resumes the LangGraph state graph by passing the filter query back into the workflow, allowing the pipeline to continue through semantic filtering, per-post analysis, and final report generation.

Review Questions

What specific data transformations are applied to Reddit posts and comments before embeddings are created for semantic search?
How does the LangGraph interrupt change the execution flow, and what inputs are required to resume the graph?
Which structured fields does the post-analysis schema require, and how are those fields used to build the final executive report?

Key Points

1
The pipeline fetches Reddit posts and recursively parses comment replies from Reddit JSON endpoints, then normalizes everything into Pydantic models for downstream processing.
2
LangGraph’s interrupt is used to pause after fetching so a human can choose which posts to focus on before any semantic search or analysis runs.
3
Semantic filtering is implemented as a custom LangChain tool that embeds post+comment text into an in-memory vector store and retrieves relevant permalinks for a user query.
4
Per-post analysis is forced into JSON-schema structured output (summary, key points, topics, controversies, sentiment, takeaways), making the final report generation deterministic and grounded.
5
DeepSeek-R1 is used for the final report writer step, while smaller models can handle tool calling and intermediate analysis to balance quality, speed, and cost.
6
A Streamlit UI orchestrates the workflow: it collects subreddit inputs, triggers the run to the interrupt, then resumes execution after the user submits a filter query.
7
The workflow is designed for local execution with configurable model providers (e.g., local inference via Ollama/AMA or remote APIs like Gro), emphasizing privacy and reproducibility.

Highlights

The workflow pauses after fetching Reddit content using a LangGraph interrupt, letting users steer relevance before compute-heavy analysis begins.

Semantic search isn’t applied to raw Reddit text directly; it embeds a curated template that includes post title/category/text plus comment text, then filters by permalinks.

Post analysis is constrained to a JSON schema (summary, key points, topics, controversies, sentiment, takeaways), which makes report writing reliable and referenceable.

DeepSeek-R1 is reserved for the final executive report step, while smaller models handle tool use and intermediate tasks for efficiency.

A Streamlit chat-style interface resumes the LangGraph state graph with the user’s filter query to complete the pipeline.

Topics

Local AI Agents
Reddit Data Fetching
LangGraph Workflows
Semantic Search Tools
Structured Output Reports

Mentioned

Venelin Valkov
UI
API
JSON
AI
LLM
UI
R1
AMA
Gro
Pydantic
LangGraph
LangChain
XML
txt

100% Local AI Agents with DeepSeek-R1, Ollama, Pydantic and LangGraph - Private Agentic Workflow