How to automate inbound phone calls | Voice AI Agent · n8n · Twilio · Ultravox
Based on Alex, PhD AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Create the Ultravox agent first with a business-specific system prompt, formal/professional voice settings, and conversation scripts for lead qualification and follow-up.
Briefing
A practical voice-AI pipeline now links inbound phone calls to an Ultravox conversational agent, with n8n acting as the glue and Twilio handling the telephony. When a customer calls a Twilio number, Twilio sends a webhook into an n8n workflow; n8n then calls Ultravox to create a live media stream and returns that stream URL back to Twilio. From that point, Twilio streams the caller’s audio directly to Ultravox, enabling a scripted agent to speak with the caller in real time—complete with prompts, tools, and retrieval-augmented generation (RAG).
The build starts inside Ultravox by creating an agent tailored to a specific business. Using a real-estate investment agency as the example, the agent is configured with a formal, professional male voice, English language output with a German accent, and a clear call purpose: qualify inbound leads, answer general questions, book appointments, explain investment options, and collect contact details. The agent is also instructed to transfer to a human when requested and to respond only to inbound calls. A system prompt is generated using a “prompt builder” approach, then refined with business facts such as portfolio size and entry investment thresholds (e.g., luxury vacation property co-ownership starting around €169,000, plus details like total portfolio value and property count).
To make the agent knowledgeable about the company, the workflow adds RAG sourced from the agency’s website. A new source is created from the domain with depth set to include sublinks, then the data is processed into chunks and vectors. In the example run, the website parsing produced 182 pages, split into 350 text chunks, and converted into 1,213 vectors. After the RAG collection finishes processing, the agent is updated to use it, so answers can draw from the company’s actual materials rather than generic training.
The second half of the system is the n8n workflow. It begins with a webhook node designed to receive Twilio’s production webhook when a call arrives. The workflow then uses an HTTP Request node to invoke Ultravox’s “create agent call” API endpoint, passing the agent ID and authentication headers (including an Ultravox API key stored in n8n). A key implementation detail is a JavaScript step that constructs TwiML—Twilio’s XML—embedding the Ultravox-generated media stream URL along with identifiers like the call SID and caller number. If Ultravox is unreachable, the workflow falls back to a hard-coded apology message.
Once the workflow is activated, a test call demonstrates the end-to-end behavior: the agent greets the caller, asks about investment goals, probes for budget and timeline, and then offers options aligned to the caller’s stated €50,000 budget—suggesting alternatives such as real estate funds or diversified property portfolios. It also requests personal contact information for follow-up and appointment booking, showing how lead qualification can be automated without sacrificing the conversational flow.
Cornell Notes
The system automates inbound phone calls by routing Twilio audio into an Ultravox voice agent, with n8n orchestrating the handoff. A Twilio webhook triggers an n8n workflow, which calls Ultravox to create a media stream and returns the stream URL back to Twilio via TwiML. In Ultravox, the agent is configured with a business-specific system prompt, a tool (e.g., a hang up action), and RAG built from the company website. In the example, the RAG pipeline processed 182 pages into 350 chunks and 1,213 vectors, enabling the agent to answer property and investment questions grounded in the source material. The result is a scripted, real-time lead-qualifying phone conversation that can collect contact details and book next steps.
How does the call actually move from Twilio to Ultravox?
What must be configured in Ultravox to make the agent business-ready?
How does RAG get added, and what does “processed” mean in this setup?
What role does n8n play beyond triggering the Ultravox API call?
What does the TwiML construction need to include for the integration to work?
How does the agent behave during the sample call?
Review Questions
- What sequence of requests and responses connects an inbound Twilio call to an Ultravox media stream?
- Which Ultravox settings (prompt, tools, RAG) are essential for the agent to qualify leads and answer property questions accurately?
- In the n8n workflow, what is the purpose of the JavaScript step that generates TwiML, and what happens when the Ultravox API call fails?
Key Points
- 1
Create the Ultravox agent first with a business-specific system prompt, formal/professional voice settings, and conversation scripts for lead qualification and follow-up.
- 2
Add RAG by ingesting the company website domain (including sublinks) and wait for the collection to finish processing into chunks and vectors.
- 3
Use n8n as the orchestrator: a webhook receives Twilio’s inbound call event, then an HTTP Request node calls Ultravox’s “create agent call” API with the agent ID.
- 4
Return TwiML to Twilio that embeds the Ultravox-generated media stream URL, along with call identifiers like Call SID and caller number, so Twilio can stream audio to Ultravox.
- 5
Store the Ultravox API key in n8n and send it via header authentication; ensure the header name/value formatting matches the required “X API key” convention.
- 6
Implement a fallback response in the n8n JavaScript logic so callers receive a clear message if Ultravox is unavailable.
- 7
Test end-to-end with a real inbound call to confirm the agent can ask budget/timeline questions, propose investment options, and collect contact details.