I’m changing how I use AI (Open WebUI + LiteLLM)

TL;DR

Open WebUI provides a single self-hosted web interface for chatting with both local LLMs and cloud LLMs.

Briefing Cornell Notes

Briefing

A single self-hosted dashboard can unify access to many major AI models—cloud-hosted chat systems like GPT and Claude, plus local models—while letting an owner control who can use what, set budgets, and monitor or restrict prompts. The approach centers on Open WebUI (an open-source, self-hosted web interface) paired with LiteLLM (used as an API-compatible proxy/gateway), turning scattered subscriptions and separate logins into one “AI hub” with admin controls.

Open WebUI is positioned as the front end: it runs on a machine you control (either a cloud VPS or on-prem hardware such as a laptop, NAS, or Raspberry Pi) and can connect to “whatever LLM” you choose. In the cloud setup, the workflow uses a VPS from a hosting provider, then installs Open WebUI alongside Llama (a local model) on Ubuntu 24.04. Once deployed, the admin creates an initial admin account, then can chat with a default local model (Llama 3.1 2B in the walkthrough) through the familiar Open WebUI interface—typically slower than the largest cloud models, but dependent on your hardware.

The key unlock is model access beyond local Llama. For cloud models, the transcript explains that Open WebUI can connect to providers via APIs, which is often cheaper and more flexible than paying for multiple full “normie” chat plans. After creating an OpenAI API key, the key is pasted into Open WebUI’s admin settings under connections, enabling access to a range of OpenAI models—including newer ones such as “4.5” mentioned in the walkthrough—inside the same interface.

But cost control is the major caveat. Billing for text LLM usage is described as token-based: tokens are roughly word fragments and punctuation, and pricing varies widely by model. The transcript gives example rates (e.g., $1.10 per 1M tokens for “o3-mini,” $15 per 1M tokens for “o1 reasoning,” and up to $75 per 1M tokens for “4.5,” with input costs called out). Conversation length and “context” increase token usage because prior messages are repeatedly sent, so heavy use can quickly exceed expectations. Caching is mentioned as a potential cost reducer, but not a guarantee.

To solve Open WebUI’s limited built-in connection types (only OpenAI and a local option are shown), LiteLLM is introduced as the proxy layer. Open WebUI connects to LiteLLM using an OpenAI-compatible API, while LiteLLM fans out to many other providers (Claude, Gemini, Grok, DeepSeek, and more). LiteLLM is deployed via Docker, configured with master and encryption keys, then set up with provider API keys. Within LiteLLM, “virtual keys” can restrict which models a given user can access and can enforce monthly budgets.

Finally, Open WebUI’s user and group controls are used to apply guardrails for family and employees. The walkthrough creates a “kids” group, assigns users, limits accessible models (e.g., only Claude 3.7), and adds a system prompt instructing the assistant to help with schoolwork without doing it for them. Admins can also review chat logs for monitoring, and the transcript notes that chat history visibility can be turned off for kids. A separate follow-up is teased for setting up a friendly DNS name instead of exposing an IP address.

Cornell Notes

Open WebUI turns a self-hosted server into a single web interface for chatting with many LLMs, including local models (like Llama) and cloud models (like GPT/Claude) via API connections. OpenAI access is enabled by creating an API key and adding it to Open WebUI’s admin connections, but costs are token-based and can spike with expensive models and long conversations. To add more providers than Open WebUI’s built-in options, LiteLLM is deployed as an OpenAI-compatible proxy that routes requests to Claude, Gemini, Grok, DeepSeek, and others. LiteLLM “virtual keys” plus Open WebUI user groups let admins restrict models, set monthly budgets, and apply school-focused guardrails for kids while monitoring chats for accountability.

Why does the setup rely on APIs instead of separate chat subscriptions for each model?

API access is described as “pay as you go” (or usage-based) rather than a fixed monthly plan. That matters because providers often include access to newly released models through API keys, even when the cheapest chat plans don’t. It can also reduce cost when multiple people aren’t heavy users—paying per usage can be cheaper than buying full plans for everyone.

What is the main cost risk when using Open WebUI with cloud LLMs?

The transcript highlights token-based billing. Tokens are word fragments and punctuation, and pricing depends on the model. Expensive models (example given: “4.5” at $75 per 1M tokens for input) can become costly quickly, especially because conversation “context” grows as prior messages are resent with each request. The result: casual estimates can break down for power users or long chats.

How does LiteLLM expand model options beyond Open WebUI’s limited connections?

Open WebUI’s connections shown are essentially OpenAI API and a local LLM option. LiteLLM acts as a gateway: Open WebUI talks to LiteLLM using an OpenAI-compatible API, and LiteLLM then connects to many other providers (Claude, Gemini, Grok, DeepSeek, etc.) using their respective API keys. This is what enables “one interface” access across multiple ecosystems.

What are “virtual keys” in LiteLLM, and how do they help with household or team control?

Virtual keys are per-user/per-purpose API keys created inside LiteLLM. They can restrict which models a user can access (e.g., only Claude 3.7 and a smaller set of models) and can optionally set a monthly budget and expiration. Those virtual keys are then used in Open WebUI so each user’s access is enforced centrally.

How are guardrails implemented for kids using Open WebUI?

The walkthrough creates a “kids” group, adds users, and assigns model permissions (only selected models). It also applies a system prompt that frames the assistant as a “school helper” with explicit rules: guide without cheating, don’t write essays or solve problems outright, and keep responses focused on school-related subjects. The admin can then test by logging in as a kid user and verifying the assistant refuses to do non-permitted tasks.

What operational step is needed to make the Open WebUI instance reachable?

After Open WebUI is installed on the VPS, it’s accessed via the server’s public IP address on port 80 (the transcript references port 80/80 80 and then using the IP in a browser). The admin then completes the Open WebUI setup by creating the admin account and logging in.

Review Questions

How do token-based pricing and growing conversation context affect monthly costs when using expensive models through Open WebUI?
What role does LiteLLM play in enabling access to Claude/Gemini/Grok/DeepSeek from the same Open WebUI interface?
Describe two different layers of control used in the walkthrough to restrict kids’ AI use (permissions/models vs. prompt/system guardrails).

Key Points

1
Open WebUI provides a single self-hosted web interface for chatting with both local LLMs and cloud LLMs.
2
Cloud model access is enabled by adding provider API keys (e.g., OpenAI) into Open WebUI’s admin connections.
3
LLM usage costs are token-based and vary sharply by model, so long chats and expensive models can drive bills quickly.
4
LiteLLM acts as an OpenAI-compatible proxy/gateway that lets Open WebUI connect to many providers beyond OpenAI and local models.
5
LiteLLM virtual keys can restrict model access and enforce monthly budgets per user or group.
6
Open WebUI user groups and system prompts can implement family/team guardrails, including “help but don’t do the work” rules for school use.
7
A public IP is used to reach the service initially, with DNS setup suggested as a separate follow-up step.

Highlights

Open WebUI + LiteLLM creates one “AI hub” where local Llama and multiple cloud providers can be accessed from the same interface.

Token-based billing plus conversation context growth is the biggest cost trap—expensive models can become costly fast.

LiteLLM’s OpenAI-compatible gateway is the mechanism that unlocks Claude, Gemini, Grok, and DeepSeek inside Open WebUI.

Virtual keys enable per-user model whitelists and monthly budgets, making household access manageable.

System prompts and model permissions together can steer an assistant toward “guidance not cheating” for kids.

Topics

Open WebUI
LiteLLM Proxy
API Keys
Token Pricing
Home AI Server