my local, AI Voice Assistant (I replaced Alexa!!)
Based on NetworkChuck's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Home Assistant can run a fully local voice pipeline by combining Whisper (STT), Piper (TTS), and Open Wake Word (wake phrase) with Assist for intent-to-action control.
Briefing
A fully local voice assistant is now practical for home automation: Home Assistant can run an offline wake word, speech-to-text, intent handling, and text-to-speech on your own hardware—then scale those pieces across multiple devices using the Wyoming protocol. The payoff is control without cloud dependence, plus the ability to swap in a local “brain” like Llama 3 for more capable, context-aware conversations that can drive real actions in the house.
The build starts with Home Assistant on a local device (a Raspberry Pi in the demo) and adds a “voice pipeline” through add-ons. Whisper provides offline speech-to-text, Piper handles offline text-to-speech, and Open Wake Word listens for a chosen wake phrase. Home Assistant’s Assist layer then connects those audio components to home automation actions—turning phrases into commands like switching lights—while keeping everything on the local network. Early tests show the system working but requiring careful phrasing and reacting with some latency, which sets up later improvements.
Next comes scaling beyond a single box. The Wyoming protocol turns extra hardware into “satellites” that can listen and speak while delegating the heavy lifting back to Home Assistant. A second Raspberry Pi is configured with a ReSpeaker 2-Mic Pi Hat, then the Wyoming satellite software is installed and run as a service so it stays online. Home Assistant connects to it over the network, and the assistant can control devices from anywhere in the house. The demo also adds an LED status behavior using the ReSpeaker’s pixel ring so the user can tell when the assistant is actively listening.
The real leap in capability arrives when the “conversation agent” is replaced with a local large language model. Instead of relying on Home Assistant’s default conversational behavior, the system points to Ollama running Llama 3.2 (downloaded and served locally). With the LLM in place, the assistant can answer factual questions and—crucially—maintain context across turns, enabling follow-up commands like turning a light back on after a prior interaction.
To reduce bottlenecks, the pipeline is offloaded to more powerful machines using Wyoming containers. On a Windows laptop, Docker runs Wyoming Whisper (speech-to-text) and Wyoming Piper (text-to-speech), and Home Assistant switches its voice pipeline endpoints to those remote services. The demo then adds a second LLM server (“Terry,” an AI server) so multiple models can be used together. The result is a fast, fully self-hosted assistant that can control smart lighting and handle more natural conversation—while still staying offline from the cloud.
The build ends with remaining gaps compared with Alexa: custom wake word training (e.g., “Terry”) and custom voice generation aren’t fully solved yet. The next step is training a new wake word using Open Wake Word’s Google Colab workflow and uploading the resulting model files into Home Assistant via the Samba add-on. Custom voice cloning is flagged as the next frontier for a future video, after hours of troubleshooting. Overall, the message is clear: with Home Assistant, Wyoming, and local LLM tooling, a cloud-free home assistant can be assembled piece by piece—and then upgraded as hardware and models improve.
Cornell Notes
The core idea is building a voice assistant that stays local end-to-end: wake word detection, speech-to-text, intent handling, and text-to-speech run on your own devices, while the “brain” can be a local LLM served via Ollama. Home Assistant orchestrates the pipeline using add-ons like Whisper (STT), Piper (TTS), and Open Wake Word (wake phrase), then routes recognized intents to home automation actions. The Wyoming protocol lets additional Raspberry Pis act as remote “satellites” for microphones/speakers, while Docker containers can offload STT/TTS to faster machines. Swapping Home Assistant’s default conversation agent for Llama 3.2 via Ollama enables more capable, context-aware responses that can drive real actions like controlling lights.
How does Home Assistant turn raw speech into actions without using cloud services?
What is the Wyoming protocol used for in this setup?
Why does replacing the conversation agent with Ollama + Llama 3.2 matter?
How does the system speed up by offloading STT and TTS to other machines?
What remaining limitations are highlighted, and what’s the next technical step?
Review Questions
- What components make up the local voice pipeline in Home Assistant, and what role does each one play?
- How does Wyoming enable adding remote microphone/speaker hardware without rebuilding the entire assistant?
- What changes when the conversation agent is switched from Home Assistant’s default to an Ollama-served Llama 3.2 model?
Key Points
- 1
Home Assistant can run a fully local voice pipeline by combining Whisper (STT), Piper (TTS), and Open Wake Word (wake phrase) with Assist for intent-to-action control.
- 2
Wyoming protocol turns extra devices into voice “satellites,” letting a Raspberry Pi with a mic/speaker handle listening and speaking while Home Assistant orchestrates the rest.
- 3
Using Ollama to serve Llama 3.2 upgrades the assistant’s conversational ability and improves multi-turn context for follow-up commands.
- 4
Dockerized Wyoming Whisper and Wyoming Piper let STT/TTS run on faster hardware, and Home Assistant can switch endpoints to those remote services.
- 5
Multiple LLM servers can be integrated by updating the voice assistant’s conversation agent settings to point to different Ollama instances (e.g., “Terry”).
- 6
Custom wake word training for a name like “Terry” requires training a new model (TF Lite/ONNX) and uploading it into Home Assistant via Samba.
- 7
Custom voice generation remains unsolved in this build and is slated for a follow-up effort after wake word training works.