the ONLY way to run Deepseek...
Based on NetworkChuck's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Running DeepSeek locally reduces the chance that prompts are stored on third-party servers governed by external policies.
Briefing
Running DeepSeek locally can keep prompts off third-party servers, but “local” isn’t automatically the same as “locked down.” The core message is that the safest practical setup is to run open models on your own machine—and then verify network behavior—using tools like LM Studio or Ollama, with Docker as an extra isolation layer.
DeepSeek’s rise is framed as a turning point for AI expectations: the model family is portrayed as outperforming major competitors while reportedly training with far less compute. The transcript contrasts DeepSeek’s claimed training cost (under $6 million) and GPU count (about 2,000 Nvidia H800 units constrained by export limits) against OpenAI’s reported scale (over $100 million and 10,000+ top-tier GPUs). The takeaway isn’t just that DeepSeek is strong; it’s that clever engineering and post-training techniques—such as self-distilled reasoning—can reduce reliance on brute-force compute. That matters because it supports the idea that smaller teams can compete by optimizing methods, and it also matters for safety because DeepSeek’s open-source availability enables local execution.
Safety concerns center on where data goes. When using DeepSeek through a browser or app, prompts are handled by the company’s servers, meaning the user’s data is stored and governed by that provider’s policies. The transcript adds a geopolitical angle: DeepSeek’s servers are described as being in China, where cybersecurity laws can give authorities broad powers to request access to data stored within the country. The proposed mitigation is straightforward—run the model locally so the data stays on the user’s device.
To make local running accessible, the transcript recommends two paths. LM Studio provides a GUI workflow for downloading and running models, including DeepSeek variants, and it highlights GPU offload behavior depending on the model size and available VRAM. Ollama offers a simpler CLI-first approach (download and run via commands), but the transcript emphasizes the hardware ceiling: DeepSeek R1 scales from small models (down to about 1.5B parameters) up to very large ones (the 671B “R1 671B” tier), which require serious compute. For most users, the practical range is described as roughly 1.5B to 14B on typical setups, with larger models reserved for high-end GPU servers.
The safety claim is then tested rather than assumed. While running Ollama, a PowerShell script monitors network connections tied to the Ollama process. The transcript reports that during local inference, connections remain limited to the local API port (not external IPs). It also notes that external connections appear only when downloading a new model, and that Ollama lacks built-in functionality to fetch internet content during inference.
Finally, the transcript addresses the “what if something changes?” risk: if a model or runtime ever gained internet-capable behavior, running directly on the OS could expose files and system settings. The recommended hardening step is to run Ollama inside Docker. Docker isolates the application from the rest of the operating system while still allowing GPU access (via Nvidia container tooling where needed). The Docker approach uses constrained privileges, GPU access, a mounted volume for settings, an exposed local API port, resource caps, and a read-only filesystem to reduce the blast radius. The result is a local DeepSeek workflow that’s not only offline during inference, but also more tightly sandboxed against unexpected network or system access.
Cornell Notes
Local execution is presented as the main way to keep DeepSeek prompts off third-party servers, but “local” should be verified. The transcript recommends running open models with LM Studio (GUI) or Ollama (CLI), then monitoring network connections to confirm inference stays on the machine. Ollama is described as listening on a local API port and not making external connections during chatting, with outside traffic occurring only for model downloads. Because the runtime could theoretically gain broader access, Docker is offered as a stronger isolation layer that limits privileges while still using the GPU. This combination—local inference plus network checks plus container isolation—aims to make private AI use safer.
Why does running DeepSeek through a browser or app raise privacy concerns compared with running it locally?
What hardware constraint determines which DeepSeek model sizes are practical to run locally?
How does the transcript verify that Ollama isn’t reaching out to the internet during inference?
What’s the role of Docker in making local AI safer?
Why does the transcript recommend both LM Studio and Ollama, and what tradeoffs are implied?
Review Questions
- What specific network behavior would indicate that local inference is contacting external servers, and how does the transcript’s monitoring approach detect that?
- How do model parameter size and GPU VRAM affect which DeepSeek R1 variants are realistically runnable on a typical machine?
- What additional risks does containerizing Ollama with Docker mitigate compared with running it directly on the OS?
Key Points
- 1
Running DeepSeek locally reduces the chance that prompts are stored on third-party servers governed by external policies.
- 2
DeepSeek’s open-source availability enables local execution, unlike closed models that can’t be run on user hardware.
- 3
LM Studio offers a GUI workflow for downloading and running models, including DeepSeek variants, with GPU offload behavior depending on VRAM.
- 4
Ollama provides a CLI method to run local models and exposes a local API port (11434) for inference requests.
- 5
Network monitoring during inference can confirm that Ollama stays local, with external connections expected mainly for model downloads.
- 6
Docker can further harden local AI by isolating Ollama in a container with constrained privileges while still allowing GPU access.