Run LLMs Locally With Docker Model Runner
Based on Krish Naik's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Docker Desktop 4.40+ is required to use Docker Model Runner.
Briefing
Running open-source LLMs locally is now straightforward with Docker Model Runner, as long as Docker Desktop is updated and a few settings are enabled. The workflow matters because it lets developers test multiple models on their own machine—without cloud calls—while still integrating with familiar tooling. The setup is designed to work on both Mac and Windows, with Docker Desktop version 4.40+ called out as the minimum requirement.
The process starts by installing Docker Desktop and then checking the version in Docker Desktop settings. A key requirement is enabling Docker Model Runner under “Features in development.” To make the local model accessible from code, “Enable host side TCP support” must also be turned on, followed by applying the change and restarting Docker Desktop. For users who prefer the command line, the same capability can be enabled via a Docker Desktop command that activates model runner and confirms the TCP port (the transcript references port 12434).
Once enabled, the system can be verified with Docker Model Runner commands. “docker model status” confirms the runner is running and provides a status output. “docker model help” lists the operational commands: inspect (detailed model info), list (available models), logs (runtime logs), pull (download models), push (upload to Docker Hub), rm (remove downloaded models), run (start a model), plus tag/version utilities. This command set is the backbone for managing models locally.
Model availability is handled through Docker Hub. Searching for Llama-family models on Docker Hub reveals multiple open-source options, including variants like Llama 3.3 and Llama 3.1. The transcript then focuses on a smaller model—AI/small lm2—described as a compact, speed-oriented language model built for efficient local use. Pulling it uses “docker model pull AI/small lm2,” and the transcript notes the model size is roughly a few hundred MB (about 256.35 MB for the referenced variant). Running it uses “docker model run AI/small lm2,” which launches an interactive chat session where prompts return responses like a local chatbot.
Beyond chat mode, the runner can accept direct input during runtime, using a command form that returns an output without maintaining a conversational session. A major practical advantage is compatibility with the OpenAI API library: the local model is exposed through a localhost base URL using the enabled TCP port. In the example, Python code imports the OpenAI client, points base_url to localhost:12434, and calls the chat completion endpoint with the local model name (AI/small lm2). The transcript also demonstrates streaming responses line-by-line by setting stream=True, and mentions function/tool calling support via the same OpenAI-style interface.
Overall, the takeaway is a repeatable local development loop: update Docker Desktop, enable Docker Model Runner with TCP access, pull and run an open-source model from Docker Hub, and then integrate it into applications using OpenAI-compatible client code—on Mac or Windows—without changing the developer workflow.
Cornell Notes
Docker Model Runner lets developers run open-source LLMs locally using Docker Desktop on both Mac and Windows. The minimum Docker Desktop version mentioned is 4.40, and setup requires enabling “Docker Model Runner” plus “host side TCP support” so applications can reach the model over a localhost port (referenced as 12434). After enabling, commands like docker model status and docker model help confirm the runner is working and list available operations (pull, run, list, inspect, logs, rm, etc.). Models are pulled from Docker Hub (example: AI/small lm2), then started with docker model run AI/small lm2 for interactive chat or direct prompt execution. The local server is OpenAI-library compatible, enabling Python code to call chat completions using the local base_url and model name, including streaming and tool-calling patterns.
What prerequisites and settings are required before Docker Model Runner can serve LLMs locally?
How can someone verify that Docker Model Runner is actually running?
Which Docker Model Runner commands are most useful for working with local LLMs?
How does the workflow connect Docker Hub models to local execution?
How does OpenAI compatibility work with a locally running model?
How can streaming and tool/function calling be used with the local setup?
Review Questions
- What exact Docker Desktop features must be enabled to allow local applications to connect to Docker Model Runner, and why is TCP support necessary?
- How would you pull and run a new Docker Hub LLM model using the Docker Model Runner command set?
- In the OpenAI-compatible Python example, which parameters determine the local server address and the model name?
Key Points
- 1
Docker Desktop 4.40+ is required to use Docker Model Runner.
- 2
Enable “Docker Model Runner” and “host side TCP support” in Docker Desktop, then restart to expose the local model over TCP.
- 3
Use docker model status to confirm the runner is running, and docker model help to access commands like pull, run, list, inspect, logs, and rm.
- 4
Pull models from Docker Hub with docker model pull <namespace/model> and start them locally with docker model run <namespace/model>.
- 5
The transcript’s example model AI/small lm2 runs as an interactive chatbot and can also be prompted directly during runtime.
- 6
Local models are OpenAI-library compatible via a localhost base_url using the enabled TCP port (referenced as 12434), enabling chat completions from Python.
- 7
Streaming (stream=True) and tool/function calling patterns work through the same OpenAI-style client interface.