Get AI summaries of any video or article — Sign up free
n8n Now Runs My ENTIRE Homelab thumbnail

n8n Now Runs My ENTIRE Homelab

NetworkChuck·
6 min read

Based on NetworkChuck's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Terry becomes useful by being taught repeatable troubleshooting steps using concrete tools: HTTP checks, SSH/CLI command execution, and Docker commands.

Briefing

A home lab can be run like an always-on IT desk by pairing n8n with an AI agent (“Terry”) that monitors services, troubleshoots failures, and—after explicit approval—executes fixes across Docker, servers, and network tools. The core idea is practical: start with tightly scoped permissions, teach the agent repeatable troubleshooting steps using real CLI/API tools, then expand capability only when guardrails and human-in-the-loop approvals are in place.

The build begins with “Baby Terry” and upgrades to a version that can do more than check whether a website responds. Terry is given access to concrete tools: an HTTP request tool to verify a site is up, and an SSH-based command runner (implemented via an n8n SSH node converted into a subworkflow) to inspect the host and manage Docker containers. The workflow is structured around a simple loop: Terry checks the website, and if it’s down, he runs the same commands a human would—first confirming container status with docker ps, then using docker inspect and docker logs to identify why the container failed. Early tests show Terry can detect a stopped container and report exit details, and the agent improves when prompts explicitly require log review.

From there, Terry shifts from “chat-driven” to scheduled operations. A schedule trigger runs every five minutes, but the workflow must be adapted because the agent’s memory and user prompt were originally tied to chat sessions. The solution is to inject a prompt and a chat/session ID via set-field nodes, then route Terry’s results to Telegram for notifications. To avoid noisy alerts, Terry is forced into structured output (JSON fields like websiteUp boolean and message text). An if node filters outcomes so Telegram messages are sent only when the website is down.

The next leap is repair, not just diagnosis. Terry’s prompt is modified so that when the website is down, he attempts a docker start and then re-checks the site. A more challenging test introduces a port conflict by running a Python server on the same port as the Dockerized site. Terry initially fails in a controlled way—he sticks to known playbooks—then a “more powerful” prompt allows broader CLI troubleshooting. That expansion reveals a key risk: the agent can take destructive actions (even stopping the wrong process), which leads to the video’s main safety mechanism.

Human-in-the-loop approval is added using an n8n “human in the loop” Telegram step. Terry must request permission before commands that modify the system. The agent outputs structured fields such as needsApproval and commandsRequested; the workflow sends an approval request to Telegram, and only after approval does it loop the approved instruction back into Terry. With this guardrail, Terry can fix the port-conflict scenario by identifying the conflicting Python process, terminating it, restarting the Docker container, and confirming the website is operational—while the human remains in control.

Finally, Terry is promoted beyond the sandbox: secure remote access is enabled via Twingate so the cloud-hosted n8n instance can reach the home lab. Terry is then reconfigured with new “personas” and tools for UniFi (API-based bandwidth analysis), Proxmox (API/CLI for VM inventory), and Plex (API for active streams and control). The takeaway is less about one perfect workflow and more about a repeatable pattern: connect monitoring to an agent, teach troubleshooting with real commands, constrain actions with structured outputs and approvals, and scale to multiple systems as documentation, sub-agents, and help-desk workflows are added in future steps.

Cornell Notes

n8n can host an AI IT agent that monitors a service, troubleshoots failures using real tools (HTTP checks, SSH/CLI, Docker commands), and fixes issues only after human approval. The workflow starts with narrow permissions: Terry verifies a website via HTTP, then uses SSH to run docker ps/inspect/logs when the site is down. Scheduled triggers run the checks every five minutes, while structured output (JSON) prevents noisy alerts by sending Telegram messages only when something is wrong. When repair is enabled, Terry can attempt fixes (e.g., restarting a Docker container), but a port-conflict test shows the danger of giving unrestricted command power. Human-in-the-loop approval via Telegram adds guardrails so Terry can request specific commands, wait for approval, and then execute them safely.

How does Terry learn to troubleshoot a “website down” problem without guessing?

Terry is taught a deterministic sequence using tools and prompts. First, an HTTP request tool checks whether the website responds. If it’s down, Terry uses an SSH-based command tool (implemented as an SSH node converted into a subworkflow) to run docker ps to confirm whether the container is running. The prompt is then expanded so Terry also runs docker inspect and docker logs, and the workflow verifies log usage by checking which commands appear in the agent’s execution history. This turns troubleshooting into a repeatable playbook rather than free-form reasoning.

Why does the schedule trigger require extra wiring compared with chat-based triggering?

Chat triggering provides an implicit user message and a chat/session ID that the agent’s memory uses. When switching to a schedule trigger, those inputs don’t exist automatically, so Terry “brain fried” errors appear. The fix is to insert set-field nodes between the schedule trigger and the AI agent: one field injects the user prompt (e.g., “check if the website is up”), and another injects a chat ID so the memory system can track context. The agent’s prompt configuration is also redirected so it uses the injected prompt rather than the original chat node.

What does structured output accomplish in the monitoring workflow?

Structured output forces Terry to return machine-readable JSON instead of narrative text. The workflow expects fields like websiteUp (boolean) and message. That enables an if node to filter notifications: Telegram messages are sent only on the false branch (websiteUp = false). Without structured output, the workflow would need brittle text parsing to decide whether to alert.

How does the human-in-the-loop approval prevent harmful actions?

When Terry is allowed to fix issues, unrestricted command execution can lead to destructive mistakes (e.g., stopping the wrong process). Human-in-the-loop adds a Telegram “send and wait for response” step where Terry must request explicit approval before system-modifying commands. Terry outputs fields such as needsApproval and commandsRequested; the workflow sends the approval request, and only after approval does it loop the approved instruction back into Terry for execution. This keeps the agent capable while retaining human control.

What was the purpose of the port-conflict test, and what did it reveal?

The test intentionally breaks the environment by running a Python server on the same port as the Dockerized website, creating a conflict. Terry initially fails when constrained to known Docker restart steps, because the container can’t bind the port. When the prompt is loosened to allow broader CLI troubleshooting, Terry can identify the conflicting Python process and resolve it—but the episode also demonstrates why approvals and guardrails are necessary to stop the agent from taking unsafe actions.

How does the setup scale from one service to multiple home-lab systems?

After the core monitoring/troubleshooting/fix pattern works, Terry is connected to additional tools and given new system prompts. With Twingate, the cloud-hosted n8n instance can securely reach the home lab. Then Terry gains UniFi API access to identify top bandwidth hogs, Proxmox API/CLI access to query VM counts, and Plex API access to check active streams and control playback. Each integration follows the same pattern: add tools, define the agent’s role, and use structured outputs plus notifications/approvals as needed.

Review Questions

  1. What inputs must be injected (and why) when replacing a chat trigger with a schedule trigger for an AI agent that uses memory?
  2. Describe how structured output changes the way alerts are routed compared with free-text responses.
  3. Why is human-in-the-loop especially important once the agent is allowed to execute fixes, and how does the workflow implement it with Telegram?

Key Points

  1. 1

    Terry becomes useful by being taught repeatable troubleshooting steps using concrete tools: HTTP checks, SSH/CLI command execution, and Docker commands.

  2. 2

    Scheduled monitoring requires injecting both a user prompt and a session/chat ID so the agent’s memory and prompt wiring still work.

  3. 3

    Structured output (JSON) enables reliable alert filtering—Telegram notifications can trigger only when a boolean condition indicates a real problem.

  4. 4

    Allowing automated fixes without guardrails can cause harmful actions; human-in-the-loop approval prevents executing system-modifying commands without consent.

  5. 5

    A port-conflict scenario demonstrates the difference between “known playbooks” and genuinely adaptive troubleshooting, and it motivates broader prompts plus approvals.

  6. 6

    Secure remote access (via Twingate) lets a cloud-hosted n8n agent operate on a local home lab continuously.

  7. 7

    The same agent pattern scales across UniFi, Proxmox, Plex, and other systems by adding the right API/CLI tools and role-specific prompts.

Highlights

The workflow turns “website down” into a full IT loop: verify with HTTP, diagnose with docker ps/inspect/logs over SSH, then notify via Telegram only when needed.
Structured output makes the agent’s results actionable: booleans drive branching logic instead of fragile text parsing.
Human-in-the-loop approval is the safety valve that lets Terry fix problems while keeping a human in control of commands that modify systems.
A port-conflict test shows why narrow permissions aren’t enough for real incidents—and why approvals are essential once troubleshooting expands.

Topics

Mentioned

  • N8N
  • AI
  • CLI
  • API
  • SSH
  • JSON