DeepSeek-R1 0528 for 100% Local Chat with Your Files | Financial Document Analysis AI with Ollama
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Convert table-heavy earnings PDFs into Markdown with reliable table extraction before asking the model questions.
Briefing
DeepSeek-R1 (distilled) running locally through Ollama can extract and summarize complex financial statements from a 10-page Nvidia earnings PDF with numbers that closely match a commercial model (Gemini 2.5 Pro). The practical takeaway is that an 8B-parameter, quantized model can handle real-world table-heavy documents—producing accurate revenue, profitability, balance sheet, and cash flow figures—while also offering a controllable “thinking” mode that dramatically affects latency.
Setup starts with downloading the DeepSeek-R1 dist model for Ollama and running it with recommended parameters: temperature 0.6 and top_p 0.95. The workflow then converts the Nvidia “Financial Results for First Quarter of Fiscal 2026” PDF into Markdown using a PDF-to-Markdown pipeline (Parse, with an open-source alternative mentioned). The resulting Markdown preserves tables well enough to support downstream extraction. Because the document is large (roughly 2,000–4,000 tokens), the context window is increased in Ollama to fit the full text.
A key engineering lever is Ollama’s support for exposing the model’s internal “thinking” output. With thinking enabled, the model returns a separate reasoning trace plus a final answer, but responses take much longer. Disabling thinking (thinking=false) cuts wall time by roughly an order of magnitude—reported as about 8–10× faster—while still returning usable answers. This matters for local deployments where interactive speed is often the bottleneck.
For document QA, prompts are structured with a role (“expert financial analyst”) and XML-style instructions that push the model to read holistically rather than grabbing early text. The output is constrained to concise Markdown, and when information isn’t present, the model is instructed to say so—an attempt to reduce hallucinations, which the author notes are still a risk for smaller models.
In tests on the Nvidia earnings Markdown, the model first produces a narrative summary: Nvidia’s first-quarter fiscal 2026 results show strong performance driven by data center growth, including a 12% quarter-over-quarter revenue increase and a 69% year-over-year jump. It also highlights developments such as the launch of Blackwell and partnerships (including Humane, Saudi Arabia, G42, and Oracle). More importantly, follow-up extraction prompts pull specific GAAP figures from the tables.
Profit-and-loss extraction (revenue, gross margin, operating income, net income, diluted EPS) matches Gemini 2.5 Pro essentially exactly, with only minor rounding differences. Balance sheet extraction (cash and equivalents plus marketable securities, inventories, accounts receivable, total liabilities, total shareholders’ equity) again aligns with Gemini 2.5 Pro. Cash flow extraction (operating cash flow, free cash flow, capex-related purchases of property and equipment, payments for common stock, and ending cash) also matches, including correct units and bracketed negatives.
The broader conclusion: DeepSeek-R1 8B quantized in Ollama is unusually strong for table-based financial document analysis at this model size, at least on this “new” and relatively complex Nvidia report. A limitation remains: the DeepSeek-R1 8B model isn’t marked as tool-capable in Ollama yet, so agentic tool calling isn’t available. An open GitHub issue is expected to address this, which would enable better tool use and potentially make thinking optional without sacrificing functionality.
Cornell Notes
DeepSeek-R1 distilled (8B quantized) running locally in Ollama can accurately analyze a table-heavy Nvidia earnings PDF after converting it to Markdown. With an increased context window and tuned sampling (temperature 0.6, top_p 0.95), it produces both narrative summaries and structured financial extractions (GAAP income statement, balance sheet, and cash flow). The extracted numbers closely match Gemini 2.5 Pro, with differences limited to minor rounding. Ollama’s “thinking” toggle is a major performance lever: disabling thinking cuts latency by roughly 8–10×. The main remaining gap is tool/agent support—this model isn’t currently marked as tool-capable in Ollama, limiting agentic workflows.
How does the workflow turn a PDF earnings report into something a local LLM can reliably query?
What Ollama settings were used to run DeepSeek-R1, and why do they matter for long financial documents?
What is the practical impact of enabling vs disabling “thinking” in Ollama?
How were prompts structured to reduce shallow extraction and improve answer quality?
Which financial statement fields were extracted, and how accurate were the results compared to Gemini 2.5 Pro?
What limitation remains for building agentic workflows with this model in Ollama?
Review Questions
- What changes to context window and sampling parameters are necessary when analyzing a multi-page earnings PDF locally with DeepSeek-R1 in Ollama?
- How does disabling the “thinking” output affect both latency and the structure of the returned response?
- Which GAAP fields were extracted from the income statement, balance sheet, and cash flow statement, and what evidence suggests the extraction was accurate?
Key Points
- 1
Convert table-heavy earnings PDFs into Markdown with reliable table extraction before asking the model questions.
- 2
Increase Ollama context window to fit the full Markdown document plus any “thinking” tokens consumed by DeepSeek-R1.
- 3
Tune sampling (temperature 0.6, top_p 0.95) to balance determinism and variability for consistent extraction.
- 4
Use structured prompts (role + holistic instructions + concise Markdown output) to reduce shallow, snippet-based answers.
- 5
Toggle thinking output to trade off accuracy/traceability versus speed; disabling thinking can cut latency by roughly 8–10×.
- 6
Validate extracted GAAP line items against a trusted reference (e.g., Gemini 2.5 Pro) to confirm table parsing and number fidelity.
- 7
Expect tool/agent calling limitations until the model is marked tool-capable in Ollama; direct generation works well even without agents.