I’m changing how I use AI (Open WebUI + LiteLLM)
Based on NetworkChuck's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Open WebUI provides a single self-hosted web interface for chatting with both local LLMs and cloud LLMs.
Briefing
A single self-hosted dashboard can unify access to many major AI models—cloud-hosted chat systems like GPT and Claude, plus local models—while letting an owner control who can use what, set budgets, and monitor or restrict prompts. The approach centers on Open WebUI (an open-source, self-hosted web interface) paired with LiteLLM (used as an API-compatible proxy/gateway), turning scattered subscriptions and separate logins into one “AI hub” with admin controls.
Open WebUI is positioned as the front end: it runs on a machine you control (either a cloud VPS or on-prem hardware such as a laptop, NAS, or Raspberry Pi) and can connect to “whatever LLM” you choose. In the cloud setup, the workflow uses a VPS from a hosting provider, then installs Open WebUI alongside Llama (a local model) on Ubuntu 24.04. Once deployed, the admin creates an initial admin account, then can chat with a default local model (Llama 3.1 2B in the walkthrough) through the familiar Open WebUI interface—typically slower than the largest cloud models, but dependent on your hardware.
The key unlock is model access beyond local Llama. For cloud models, the transcript explains that Open WebUI can connect to providers via APIs, which is often cheaper and more flexible than paying for multiple full “normie” chat plans. After creating an OpenAI API key, the key is pasted into Open WebUI’s admin settings under connections, enabling access to a range of OpenAI models—including newer ones such as “4.5” mentioned in the walkthrough—inside the same interface.
But cost control is the major caveat. Billing for text LLM usage is described as token-based: tokens are roughly word fragments and punctuation, and pricing varies widely by model. The transcript gives example rates (e.g., $1.10 per 1M tokens for “o3-mini,” $15 per 1M tokens for “o1 reasoning,” and up to $75 per 1M tokens for “4.5,” with input costs called out). Conversation length and “context” increase token usage because prior messages are repeatedly sent, so heavy use can quickly exceed expectations. Caching is mentioned as a potential cost reducer, but not a guarantee.
To solve Open WebUI’s limited built-in connection types (only OpenAI and a local option are shown), LiteLLM is introduced as the proxy layer. Open WebUI connects to LiteLLM using an OpenAI-compatible API, while LiteLLM fans out to many other providers (Claude, Gemini, Grok, DeepSeek, and more). LiteLLM is deployed via Docker, configured with master and encryption keys, then set up with provider API keys. Within LiteLLM, “virtual keys” can restrict which models a given user can access and can enforce monthly budgets.
Finally, Open WebUI’s user and group controls are used to apply guardrails for family and employees. The walkthrough creates a “kids” group, assigns users, limits accessible models (e.g., only Claude 3.7), and adds a system prompt instructing the assistant to help with schoolwork without doing it for them. Admins can also review chat logs for monitoring, and the transcript notes that chat history visibility can be turned off for kids. A separate follow-up is teased for setting up a friendly DNS name instead of exposing an IP address.
Cornell Notes
Open WebUI turns a self-hosted server into a single web interface for chatting with many LLMs, including local models (like Llama) and cloud models (like GPT/Claude) via API connections. OpenAI access is enabled by creating an API key and adding it to Open WebUI’s admin connections, but costs are token-based and can spike with expensive models and long conversations. To add more providers than Open WebUI’s built-in options, LiteLLM is deployed as an OpenAI-compatible proxy that routes requests to Claude, Gemini, Grok, DeepSeek, and others. LiteLLM “virtual keys” plus Open WebUI user groups let admins restrict models, set monthly budgets, and apply school-focused guardrails for kids while monitoring chats for accountability.
Why does the setup rely on APIs instead of separate chat subscriptions for each model?
What is the main cost risk when using Open WebUI with cloud LLMs?
How does LiteLLM expand model options beyond Open WebUI’s limited connections?
What are “virtual keys” in LiteLLM, and how do they help with household or team control?
How are guardrails implemented for kids using Open WebUI?
What operational step is needed to make the Open WebUI instance reachable?
Review Questions
- How do token-based pricing and growing conversation context affect monthly costs when using expensive models through Open WebUI?
- What role does LiteLLM play in enabling access to Claude/Gemini/Grok/DeepSeek from the same Open WebUI interface?
- Describe two different layers of control used in the walkthrough to restrict kids’ AI use (permissions/models vs. prompt/system guardrails).
Key Points
- 1
Open WebUI provides a single self-hosted web interface for chatting with both local LLMs and cloud LLMs.
- 2
Cloud model access is enabled by adding provider API keys (e.g., OpenAI) into Open WebUI’s admin connections.
- 3
LLM usage costs are token-based and vary sharply by model, so long chats and expensive models can drive bills quickly.
- 4
LiteLLM acts as an OpenAI-compatible proxy/gateway that lets Open WebUI connect to many providers beyond OpenAI and local models.
- 5
LiteLLM virtual keys can restrict model access and enforce monthly budgets per user or group.
- 6
Open WebUI user groups and system prompts can implement family/team guardrails, including “help but don’t do the work” rules for school use.
- 7
A public IP is used to reach the service initially, with DNS setup suggested as a separate follow-up step.