100% Local AI Agents with DeepSeek-R1, Ollama, Pydantic and LangGraph - Private Agentic Workflow
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
The pipeline fetches Reddit posts and recursively parses comment replies from Reddit JSON endpoints, then normalizes everything into Pydantic models for downstream processing.
Briefing
A fully local “agentic” workflow can fetch Reddit posts, let a person steer which threads matter, then run semantic filtering and structured analysis to produce a grounded executive report—complete with references—using models that run on a user’s own machine. The practical payoff is privacy and control: the pipeline pulls public Reddit data, processes it locally with DeepSeek-R1 (via local inference), and only pauses for human input at the point where relevance is subjective.
The workflow starts with a user-provided subreddit name and a maximum number of posts to fetch. A first agent pulls the posts and their associated comments using Reddit’s JSON endpoints (including comment replies), applying a minimum score threshold and configurable depth so the comment tree doesn’t explode. Each post is normalized into a structured data model (Pydantic-based), capturing fields like title, text, score, author, category (optional), and a recursive list of comments.
Next comes the human-in-the-loop step. The graph execution intentionally interrupts after fetching so the user can choose which posts to focus on. That selection is then used as the input to a semantic search tool: the system converts posts (title, category, text, and comment text) into embeddings and stores them in an in-memory vector store. A custom LangChain tool (“search documents”) takes the user’s query, cleans it, retrieves the most relevant post-permalinks, and returns only those matches back into the workflow.
For each selected post, a second agent performs structured analysis with a model configured for JSON-schema output. The analysis prompt asks for a summary, key points, main topics, controversies, takeaways, and sentiment—explicitly enumerating sentiment options—and it grounds the output in both the post text and the associated comments. This structured output is then fed into a final report writer agent.
The report writer uses DeepSeek-R1 to generate an executive summary that answers the user’s original question, while also listing titles, summaries, takeaways, and references to the specific Reddit posts used. The workflow is built with LangGraph’s functional API and uses a checkpointer (memory saver) to persist state so the run can pause at the interrupt and later resume once the user provides a filter query.
A key engineering choice is mixing models by task. Smaller, faster models handle tool use and analysis (e.g., Qwen 2.5 for semantic/tool-related steps), while the larger DeepSeek-R1 model is reserved for the final report. The transcript also notes that DeepSeek-R1 performs better for tool calling than Qwen 2.5 in some cases, but the overall design keeps costs and latency manageable.
Finally, the pipeline is wrapped in a Streamlit UI. Users enter a subreddit and post limit, download/fetch posts, then interact through a chat-style input to steer filtering and trigger report generation. A demo on a public subreddit compares outputs across model choices (e.g., “Gro 3 versus OpenAI o3” framing in the example), showing how the system produces a report with references in roughly a minute on modest hardware. The result is a private, reproducible agentic workflow that can be extended—such as saving fetched posts to a local or remote database for longer-term analysis.
Cornell Notes
The workflow builds a local agentic pipeline that turns Reddit data into a grounded executive report. It fetches posts and recursively parses comments from Reddit’s JSON endpoints, then pauses for human input to choose which threads to prioritize. After that, a semantic search tool embeds post text and comment text into an in-memory vector store, retrieves the most relevant permalinks for a user query, and passes only those selections forward. A structured-analysis agent produces JSON outputs (summary, key points, topics, controversies, takeaways, sentiment) for each selected post. A final report writer agent (DeepSeek-R1) compiles an executive summary that answers the user’s question and includes references to the analyzed posts.
How does the system fetch Reddit content while keeping the workflow local and structured?
Where does the “human in the loop” happen, and why is it placed there?
How does semantic filtering work once the user provides a query?
What does “structured output” mean in this pipeline, and what fields are produced?
Why mix different models across steps instead of using one model everywhere?
How does the UI coordinate with the paused/resumed workflow?
Review Questions
- What specific data transformations are applied to Reddit posts and comments before embeddings are created for semantic search?
- How does the LangGraph interrupt change the execution flow, and what inputs are required to resume the graph?
- Which structured fields does the post-analysis schema require, and how are those fields used to build the final executive report?
Key Points
- 1
The pipeline fetches Reddit posts and recursively parses comment replies from Reddit JSON endpoints, then normalizes everything into Pydantic models for downstream processing.
- 2
LangGraph’s interrupt is used to pause after fetching so a human can choose which posts to focus on before any semantic search or analysis runs.
- 3
Semantic filtering is implemented as a custom LangChain tool that embeds post+comment text into an in-memory vector store and retrieves relevant permalinks for a user query.
- 4
Per-post analysis is forced into JSON-schema structured output (summary, key points, topics, controversies, sentiment, takeaways), making the final report generation deterministic and grounded.
- 5
DeepSeek-R1 is used for the final report writer step, while smaller models can handle tool calling and intermediate analysis to balance quality, speed, and cost.
- 6
A Streamlit UI orchestrates the workflow: it collects subreddit inputs, triggers the run to the interrupt, then resumes execution after the user submits a filter query.
- 7
The workflow is designed for local execution with configurable model providers (e.g., local inference via Ollama/AMA or remote APIs like Gro), emphasizing privacy and reproducibility.