DeepSeek-R1 0528 for 100% Local Chat with Your Files | Financial Document Analysis AI with Ollama

TL;DR

Convert table-heavy earnings PDFs into Markdown with reliable table extraction before asking the model questions.

Briefing Cornell Notes

Briefing

DeepSeek-R1 (distilled) running locally through Ollama can extract and summarize complex financial statements from a 10-page Nvidia earnings PDF with numbers that closely match a commercial model (Gemini 2.5 Pro). The practical takeaway is that an 8B-parameter, quantized model can handle real-world table-heavy documents—producing accurate revenue, profitability, balance sheet, and cash flow figures—while also offering a controllable “thinking” mode that dramatically affects latency.

Setup starts with downloading the DeepSeek-R1 dist model for Ollama and running it with recommended parameters: temperature 0.6 and top_p 0.95. The workflow then converts the Nvidia “Financial Results for First Quarter of Fiscal 2026” PDF into Markdown using a PDF-to-Markdown pipeline (Parse, with an open-source alternative mentioned). The resulting Markdown preserves tables well enough to support downstream extraction. Because the document is large (roughly 2,000–4,000 tokens), the context window is increased in Ollama to fit the full text.

A key engineering lever is Ollama’s support for exposing the model’s internal “thinking” output. With thinking enabled, the model returns a separate reasoning trace plus a final answer, but responses take much longer. Disabling thinking (thinking=false) cuts wall time by roughly an order of magnitude—reported as about 8–10× faster—while still returning usable answers. This matters for local deployments where interactive speed is often the bottleneck.

For document QA, prompts are structured with a role (“expert financial analyst”) and XML-style instructions that push the model to read holistically rather than grabbing early text. The output is constrained to concise Markdown, and when information isn’t present, the model is instructed to say so—an attempt to reduce hallucinations, which the author notes are still a risk for smaller models.

In tests on the Nvidia earnings Markdown, the model first produces a narrative summary: Nvidia’s first-quarter fiscal 2026 results show strong performance driven by data center growth, including a 12% quarter-over-quarter revenue increase and a 69% year-over-year jump. It also highlights developments such as the launch of Blackwell and partnerships (including Humane, Saudi Arabia, G42, and Oracle). More importantly, follow-up extraction prompts pull specific GAAP figures from the tables.

Profit-and-loss extraction (revenue, gross margin, operating income, net income, diluted EPS) matches Gemini 2.5 Pro essentially exactly, with only minor rounding differences. Balance sheet extraction (cash and equivalents plus marketable securities, inventories, accounts receivable, total liabilities, total shareholders’ equity) again aligns with Gemini 2.5 Pro. Cash flow extraction (operating cash flow, free cash flow, capex-related purchases of property and equipment, payments for common stock, and ending cash) also matches, including correct units and bracketed negatives.

The broader conclusion: DeepSeek-R1 8B quantized in Ollama is unusually strong for table-based financial document analysis at this model size, at least on this “new” and relatively complex Nvidia report. A limitation remains: the DeepSeek-R1 8B model isn’t marked as tool-capable in Ollama yet, so agentic tool calling isn’t available. An open GitHub issue is expected to address this, which would enable better tool use and potentially make thinking optional without sacrificing functionality.

Cornell Notes

DeepSeek-R1 distilled (8B quantized) running locally in Ollama can accurately analyze a table-heavy Nvidia earnings PDF after converting it to Markdown. With an increased context window and tuned sampling (temperature 0.6, top_p 0.95), it produces both narrative summaries and structured financial extractions (GAAP income statement, balance sheet, and cash flow). The extracted numbers closely match Gemini 2.5 Pro, with differences limited to minor rounding. Ollama’s “thinking” toggle is a major performance lever: disabling thinking cuts latency by roughly 8–10×. The main remaining gap is tool/agent support—this model isn’t currently marked as tool-capable in Ollama, limiting agentic workflows.

How does the workflow turn a PDF earnings report into something a local LLM can reliably query?

The Nvidia earnings PDF is converted into Markdown using a PDF-to-Markdown approach (Parse was used for quick, high-accuracy table extraction). The resulting Markdown preserves table structure well enough for later prompts to extract specific GAAP line items. The Markdown text is then loaded into a variable (e.g., Nvidia earnings) and passed into Ollama as the document context.

What Ollama settings were used to run DeepSeek-R1, and why do they matter for long financial documents?

The model runs with temperature 0.6 and top_p 0.95 for sampling control. The context window is increased because the document is large (roughly 2,000–4,000 tokens). Since DeepSeek-R1 is a “thinking” model, those reasoning tokens also consume context, so the context window must be expanded to avoid truncation and to keep extraction accurate.

What is the practical impact of enabling vs disabling “thinking” in Ollama?

With thinking enabled, Ollama returns a separate thinking string plus the final response, but latency is much higher (reported around ~18 seconds for a long summary). Disabling thinking (thinking=false) removes the thinking output and speeds responses up dramatically—about 8–10× faster—while still producing usable answers. This is crucial for interactive local chat with documents.

How were prompts structured to reduce shallow extraction and improve answer quality?

Prompts use a role (“expert financial analyst”) and XML-style instructions that tell the model to consider the complete document, not just the first relevant snippet. They also request concise Markdown output and instruct the model to state when information isn’t available in the document. The document text and the user question are both included in the prompt, with whitespace trimmed to avoid formatting artifacts.

Which financial statement fields were extracted, and how accurate were the results compared to Gemini 2.5 Pro?

Profit/loss (GAAP) extraction requested revenue, gross margin, operating income, net income, and diluted EPS. Balance sheet extraction requested cash + cash equivalents + marketable securities, inventories, accounts receivable (net), total liabilities, and total shareholders’ equity. Cash flow extraction requested net cash from operating activities, free cash flow, purchases of property and equipment, payments related to purchases of common stock, and ending cash equivalents. Across these categories, the extracted values matched Gemini 2.5 Pro essentially exactly, with only minor rounding differences.

What limitation remains for building agentic workflows with this model in Ollama?

Tool/agent calling wasn’t available because the DeepSeek-R1 8B model isn’t marked with tool support in Ollama (as of the transcript). The workflow therefore focuses on direct generate-based QA and extraction rather than tool-using agents. A GitHub issue is referenced as the likely path to enabling tool calling and improving agentic behavior.

Review Questions

What changes to context window and sampling parameters are necessary when analyzing a multi-page earnings PDF locally with DeepSeek-R1 in Ollama?
How does disabling the “thinking” output affect both latency and the structure of the returned response?
Which GAAP fields were extracted from the income statement, balance sheet, and cash flow statement, and what evidence suggests the extraction was accurate?

Key Points

1
Convert table-heavy earnings PDFs into Markdown with reliable table extraction before asking the model questions.
2
Increase Ollama context window to fit the full Markdown document plus any “thinking” tokens consumed by DeepSeek-R1.
3
Tune sampling (temperature 0.6, top_p 0.95) to balance determinism and variability for consistent extraction.
4
Use structured prompts (role + holistic instructions + concise Markdown output) to reduce shallow, snippet-based answers.
5
Toggle thinking output to trade off accuracy/traceability versus speed; disabling thinking can cut latency by roughly 8–10×.
6
Validate extracted GAAP line items against a trusted reference (e.g., Gemini 2.5 Pro) to confirm table parsing and number fidelity.
7
Expect tool/agent calling limitations until the model is marked tool-capable in Ollama; direct generation works well even without agents.

Highlights

DeepSeek-R1 8B quantized in Ollama produced GAAP income statement, balance sheet, and cash flow extractions that matched Gemini 2.5 Pro essentially perfectly (minor rounding only).

Disabling Ollama’s “thinking” output reduced response time by about 8–10× while still returning correct financial answers.

A Markdown conversion step that preserves tables was central to making accurate number extraction possible from a 10-page Nvidia earnings report.

The model’s current lack of tool support in Ollama limits agentic workflows, even though document QA and extraction work well. 

Topics

Local Document Chat
DeepSeek-R1
Ollama
Financial Statement Extraction
PDF to Markdown

Mentioned

Venelin Valkov
GAAP
KV
JSON
LLM
Ollama