Get AI summaries of any video or article — Sign up free
FREE Phone Calls with Claude Code thumbnail

FREE Phone Calls with Claude Code

NetworkChuck·
5 min read

Based on NetworkChuck's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

SIP signaling connects a phone system to Claude Code, while a separate media pipeline handles audio capture, transcription, and speech synthesis.

Briefing

A hobby VoIP setup can be wired into Claude Code so phone calls—down to an analog payphone—can trigger AI workflows, keep conversational context, and even run business actions like creating ClickUp tasks or generating Slack messages. The core breakthrough is treating SIP (the call-signaling protocol behind most VoIP systems) as the bridge between a phone system and Claude Code, then adding a separate “media” layer for audio and speech-to-text/text-to-speech.

The build starts with three CX’s AI receptionist and transcription features, which the creator sets up in about 10 seconds by adding an API key and configuring an AI agent (“Dolores Umbridge”) with call-handling context. The receptionist can handle both normal requests and edge cases like abusive or frustrated callers, and it supports transcription via OpenAI, Google, or local three CX options. That quick success sparks the bigger idea: if a phone system can talk to Claude Code, then Claude Code can become reachable from anywhere—without requiring the caller to understand anything about APIs.

To make calls work end-to-end, the project leans on SIP for signaling and a separate media server for the actual audio stream. The creator initially considers a commercial all-in-one SIP/media solution (Jam Bones), but balks at the cost—about $1,000 per month for a single node. Instead, the setup uses free, open-source components: FreeSWITCH as the SIP stack and an additional media/processing path built around voice activity detection, Whisper for speech-to-text, and ElevenLabs for text-to-speech. A wrapper server on a Mac handles the handoff: it detects when the user speaks, transcribes, sends the text into Claude Code, then returns Claude Code’s spoken response back through the VoIP pipeline.

A key constraint is that three CX’s free tier doesn’t allow custom SIP trunks. The workaround is to register Claude Code as if it were a phone endpoint inside the three CX system—so it appears as an extension with a ready status and can receive calls directly. From there, the creator defines “call skills” that let Claude Code perform actions during a live call, while Morpheus acts as an executive assistant with access to Claude Code skills.

The practical payoff shows up in demos: Morpheus can create ClickUp tasks and send Slack messages with the task link, while maintaining context across the same call session. It also supports “fire and forget” workflows—calling Morpheus, triggering a job (like generating hyper-realistic thumbnails), then hanging up while results arrive later via Slack.

Finally, the project moves from entertainment to operations with an N8N workflow that checks storage cluster health (e.g., SSD pool capacity thresholds). When conditions are met, an HTTP request triggers the Claude Code-backed phone agent (“Stephanie”) to call the creator, ask for details, and then send a Slack update using its Slack skill. The result is a proof-of-concept for AI-driven phone-based monitoring and response—an interface that can reach the creator even when they’re away from the dashboard.

The creator frames it as a janky but free POC, with a separate documentation/video promised for installation. The bigger message is that once SIP signaling and Claude Code are connected, phone access becomes a control surface for AI—capable of running real workflows, not just answering questions.

Cornell Notes

The project connects a phone system to Claude Code by using SIP for call signaling and a separate audio pipeline for speech. After setting up three CX’s AI receptionist and transcription features, the creator builds a bridge where calls to a registered Claude Code extension can trigger Claude Code skills and keep conversational context. A wrapper server performs voice activity detection, transcribes speech with Whisper, and generates responses with ElevenLabs text-to-speech. The system then runs real actions—like creating ClickUp tasks, sending Slack messages, and monitoring storage cluster health via N8N—then calls or messages the user when thresholds are crossed. It matters because it turns “phone calls” into an interface for AI workflows, usable even from places with no internet access (e.g., payphones).

How does SIP fit into the Claude Code phone-call bridge?

SIP (Session Initiation Protocol) handles the signaling that sets up and routes VoIP calls. The creator’s concept is to give Claude Code a SIP-facing endpoint so three CX can communicate with Claude Code over SIP messaging. Once SIP establishes the call, a separate “media” path carries the actual audio (voice, hold music, etc.). Without SIP for signaling, the phone system can’t reliably connect the caller to the AI endpoint.

Why was a commercial all-in-one SIP/media solution avoided, and what replaced it?

Research pointed to Jam Bones, which appeared to handle both SIP server and media server roles, but pricing was prohibitive—about $1,000 per month for a single self-hosted node. The workaround uses free, open-source components: FreeSWITCH as the SIP stack, plus a separate media/audio processing setup. The creator then stitches them together with a wrapper server that hands transcribed text to Claude Code and returns synthesized speech back into the call.

What does the wrapper server do during a live call?

The wrapper server runs on the creator’s Mac and performs voice activity detection to determine when the user starts and stops speaking. It then uses Whisper to convert speech to text, sends that text into Claude Code, and uses ElevenLabs to convert Claude Code’s response back into speech. This is the “media” side that complements SIP’s call setup.

How does the setup work without custom SIP trunks on three CX’s free tier?

Instead of using a SIP trunk, Claude Code registers directly with the three CX phone system as if it were a phone endpoint. That makes Claude Code show up as an extension with a ready status that can receive calls. This avoids the free-tier limitation on custom SIP trunks while still letting callers reach the AI through normal dialing.

What kinds of real tasks can the AI perform during or after a call?

During calls, Morpheus (the executive assistant) can use Claude Code skills to create ClickUp tasks and send Slack messages containing the task link. It also maintains context across the session, so follow-up requests can reference earlier actions. For longer jobs, it supports “fire and forget”: the caller triggers work (like generating hyper-realistic thumbnails), hangs up, and later receives results via Slack.

How is monitoring and alerting implemented using phone-based AI?

An N8N workflow checks storage cluster health (for example, SSD pool capacity over 70%). The workflow sends an HTTP request to the local API server, which then prompts the phone-based agent (“Stephanie”) to call the creator. Stephanie can ask for details, troubleshoot, and then send a Slack message when finished—turning operational alerts into interactive phone conversations.

Review Questions

  1. What roles do SIP signaling and the media pipeline play in making an AI phone agent work?
  2. Why does registering Claude Code as an extension matter for compatibility with three CX’s free tier?
  3. Describe one demo workflow (ClickUp/Slack or storage monitoring) and how the system delivers the result to the user.

Key Points

  1. 1

    SIP signaling connects a phone system to Claude Code, while a separate media pipeline handles audio capture, transcription, and speech synthesis.

  2. 2

    three CX’s AI receptionist and transcription features are used as a starting point, but the main leap is turning phone calls into Claude Code-triggered workflows.

  3. 3

    FreeSWITCH is used as the SIP stack, avoiding a costly all-in-one SIP/media option (Jam Bones) by relying on open-source components.

  4. 4

    A local wrapper server performs voice activity detection, Whisper speech-to-text, and ElevenLabs text-to-speech to complete the call loop.

  5. 5

    Because three CX free tier blocks custom SIP trunks, Claude Code is registered as an extension so it can receive calls directly.

  6. 6

    Claude Code “call skills” let the system run real actions like creating ClickUp tasks and sending Slack messages, with context preserved across a call.

  7. 7

    N8N can monitor infrastructure thresholds and trigger phone-based AI troubleshooting and follow-up notifications via Slack.

Highlights

Claude Code can be reached like a phone extension inside three CX, so dialing an AI endpoint can trigger real automation—no custom SIP trunk required.
The call loop is built from two halves: SIP for call setup and FreeSWITCH signaling, plus Whisper + ElevenLabs for the audio-to-text-to-audio experience.
Live calls can create ClickUp tasks and send Slack messages, while longer jobs can run after the caller hangs up and deliver results later.
Storage monitoring becomes interactive: N8N checks capacity thresholds and triggers a phone agent to troubleshoot and then report back on Slack.

Topics

  • Claude Code Integration
  • VoIP SIP
  • FreeSWITCH
  • Whisper Transcription
  • ElevenLabs TTS
  • three CX AI Receptionist
  • N8N Monitoring

Mentioned