n8n Now Runs My ENTIRE Homelab
Based on NetworkChuck's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Terry becomes useful by being taught repeatable troubleshooting steps using concrete tools: HTTP checks, SSH/CLI command execution, and Docker commands.
Briefing
A home lab can be run like an always-on IT desk by pairing n8n with an AI agent (“Terry”) that monitors services, troubleshoots failures, and—after explicit approval—executes fixes across Docker, servers, and network tools. The core idea is practical: start with tightly scoped permissions, teach the agent repeatable troubleshooting steps using real CLI/API tools, then expand capability only when guardrails and human-in-the-loop approvals are in place.
The build begins with “Baby Terry” and upgrades to a version that can do more than check whether a website responds. Terry is given access to concrete tools: an HTTP request tool to verify a site is up, and an SSH-based command runner (implemented via an n8n SSH node converted into a subworkflow) to inspect the host and manage Docker containers. The workflow is structured around a simple loop: Terry checks the website, and if it’s down, he runs the same commands a human would—first confirming container status with docker ps, then using docker inspect and docker logs to identify why the container failed. Early tests show Terry can detect a stopped container and report exit details, and the agent improves when prompts explicitly require log review.
From there, Terry shifts from “chat-driven” to scheduled operations. A schedule trigger runs every five minutes, but the workflow must be adapted because the agent’s memory and user prompt were originally tied to chat sessions. The solution is to inject a prompt and a chat/session ID via set-field nodes, then route Terry’s results to Telegram for notifications. To avoid noisy alerts, Terry is forced into structured output (JSON fields like websiteUp boolean and message text). An if node filters outcomes so Telegram messages are sent only when the website is down.
The next leap is repair, not just diagnosis. Terry’s prompt is modified so that when the website is down, he attempts a docker start and then re-checks the site. A more challenging test introduces a port conflict by running a Python server on the same port as the Dockerized site. Terry initially fails in a controlled way—he sticks to known playbooks—then a “more powerful” prompt allows broader CLI troubleshooting. That expansion reveals a key risk: the agent can take destructive actions (even stopping the wrong process), which leads to the video’s main safety mechanism.
Human-in-the-loop approval is added using an n8n “human in the loop” Telegram step. Terry must request permission before commands that modify the system. The agent outputs structured fields such as needsApproval and commandsRequested; the workflow sends an approval request to Telegram, and only after approval does it loop the approved instruction back into Terry. With this guardrail, Terry can fix the port-conflict scenario by identifying the conflicting Python process, terminating it, restarting the Docker container, and confirming the website is operational—while the human remains in control.
Finally, Terry is promoted beyond the sandbox: secure remote access is enabled via Twingate so the cloud-hosted n8n instance can reach the home lab. Terry is then reconfigured with new “personas” and tools for UniFi (API-based bandwidth analysis), Proxmox (API/CLI for VM inventory), and Plex (API for active streams and control). The takeaway is less about one perfect workflow and more about a repeatable pattern: connect monitoring to an agent, teach troubleshooting with real commands, constrain actions with structured outputs and approvals, and scale to multiple systems as documentation, sub-agents, and help-desk workflows are added in future steps.
Cornell Notes
n8n can host an AI IT agent that monitors a service, troubleshoots failures using real tools (HTTP checks, SSH/CLI, Docker commands), and fixes issues only after human approval. The workflow starts with narrow permissions: Terry verifies a website via HTTP, then uses SSH to run docker ps/inspect/logs when the site is down. Scheduled triggers run the checks every five minutes, while structured output (JSON) prevents noisy alerts by sending Telegram messages only when something is wrong. When repair is enabled, Terry can attempt fixes (e.g., restarting a Docker container), but a port-conflict test shows the danger of giving unrestricted command power. Human-in-the-loop approval via Telegram adds guardrails so Terry can request specific commands, wait for approval, and then execute them safely.
How does Terry learn to troubleshoot a “website down” problem without guessing?
Why does the schedule trigger require extra wiring compared with chat-based triggering?
What does structured output accomplish in the monitoring workflow?
How does the human-in-the-loop approval prevent harmful actions?
What was the purpose of the port-conflict test, and what did it reveal?
How does the setup scale from one service to multiple home-lab systems?
Review Questions
- What inputs must be injected (and why) when replacing a chat trigger with a schedule trigger for an AI agent that uses memory?
- Describe how structured output changes the way alerts are routed compared with free-text responses.
- Why is human-in-the-loop especially important once the agent is allowed to execute fixes, and how does the workflow implement it with Telegram?
Key Points
- 1
Terry becomes useful by being taught repeatable troubleshooting steps using concrete tools: HTTP checks, SSH/CLI command execution, and Docker commands.
- 2
Scheduled monitoring requires injecting both a user prompt and a session/chat ID so the agent’s memory and prompt wiring still work.
- 3
Structured output (JSON) enables reliable alert filtering—Telegram notifications can trigger only when a boolean condition indicates a real problem.
- 4
Allowing automated fixes without guardrails can cause harmful actions; human-in-the-loop approval prevents executing system-modifying commands without consent.
- 5
A port-conflict scenario demonstrates the difference between “known playbooks” and genuinely adaptive troubleshooting, and it motivates broader prompts plus approvals.
- 6
Secure remote access (via Twingate) lets a cloud-hosted n8n agent operate on a local home lab continuously.
- 7
The same agent pattern scales across UniFi, Proxmox, Plex, and other systems by adding the right API/CLI tools and role-specific prompts.