Get AI summaries of any video or article — Sign up free
Realtime Voice AI AGENTS Will Explode in 2025 | SHOWCASE thumbnail

Realtime Voice AI AGENTS Will Explode in 2025 | SHOWCASE

All About AI·
5 min read

Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

The appointment workflow relies on real-time voice plus function calling to query a scheduling database for exact date/slot availability.

Briefing

Real-time voice AI agents are moving from demos to practical business workflows—using function calling to check availability, confirm bookings, and write results into a database during a live phone conversation. The showcase centers on an “AI Dental” receptionist that answers calls, gathers appointment details, queries a scheduling database for open slots, rejects unavailable times, and then books the customer automatically.

The system is built around a real-time API plus a tightly controlled system message that defines the agent’s role and the exact tools it can use. When a caller asks for a specific time—like “9 a.m. tomorrow”—the agent triggers a function call that checks the database for that exact date and slot. If the requested slot is already taken, the agent responds with an apology and offers alternatives by calling another function to list available slots. Once the caller selects an open time (for example, “11 a.m.”), the agent confirms the appointment, collects contact information and any special requirements, and records the booking.

A key operational detail is the pipeline that turns a live call into structured data. The conversation is recorded and converted into an MP3 file, then transcribed using Whisper. From the transcript, structured outputs extract the fields needed to fill the appointment schema—first and last name, appointment date, chosen slot, contact email/phone, and special requests (such as requesting “calm background music” or even a “beer” to be offered during the visit). Those extracted values are then saved to the database and reflected on a dashboard, demonstrating end-to-end automation from voice interaction to persistent scheduling records.

The agent’s behavior also depends heavily on instruction-following. The system message instructs the agent to greet and identify the caller’s needs, collect required information, use the “list available slots” tool when asked for options, use the “check availability” tool when a specific time is requested, and always read back the confirmed date and time. The creator notes that some fields can be buggy (special requirements and certain extracted details aren’t always perfect yet), but the overall booking loop works reliably enough to be compelling.

Beyond scheduling logic, the showcase highlights voice and personality flexibility. Using the real-time API playground, different voices (e.g., “Sage” and “Ash”) can be selected, and the agent’s spoken style can be adjusted with voice configuration settings such as accent and pacing. Quick call snippets show the receptionist persona shifting while still performing the same appointment workflow.

The takeaway for 2025 is less about a single dental use case and more about a repeatable pattern: real-time voice input, tool-based function calling for database-backed decisions, structured extraction for reliable record-keeping, and configurable voice/personality for user experience. With more testing before production, the approach points to a broader wave of voice agents that can handle customer interactions—answering questions, checking availability, and completing transactions—without human intervention.

Cornell Notes

A real-time voice AI receptionist automates dental appointment bookings by combining a real-time API with function calling and a database-backed scheduling workflow. During a call, the agent identifies the requested date/time, checks availability for exact slots, lists alternatives when a time is taken, and confirms the chosen appointment. After the conversation, the system records audio, transcribes it with Whisper, and uses structured outputs to extract key fields like name, appointment date, slot, contact info, and special requirements, then saves the booking to a dashboard. The system’s reliability depends on a detailed system message that forces consistent tool use and confirmation behavior. Voice and personality can be swapped by selecting different voices in the real-time API playground.

How does the agent decide whether a requested appointment time is available?

When a caller names a time (e.g., “9 a.m. tomorrow”), that exact value is used to trigger a function call that queries the scheduling database for the specified date and slot. If the database indicates the slot is taken, the agent responds that the time isn’t available and offers to list open slots. If the slot is free, the agent proceeds to confirmation and booking.

What happens when a caller asks for available times instead of a specific slot?

If the caller requests options (e.g., “list like three available slots”), the agent uses a dedicated tool to list available slots for the requested date. The caller then selects one of the offered times, which leads to another availability check before the system confirms the appointment.

How does the system turn a phone conversation into a structured appointment record?

The call is recorded and saved as an MP3 file, then transcribed with Whisper. Structured outputs extract the appointment schema fields from the transcript—first name, last name, appointment date, chosen slot, contact email/phone, and any special requirements. Those extracted values are then written to the database and appear on the dashboard.

Why does the system message matter so much to the agent’s performance?

The system message tightly defines the agent’s role and the allowed tools (e.g., checking availability and listing available slots). It also instructs the agent to follow a consistent workflow: greet and identify needs, collect required information, use the correct tool based on whether the caller asks for a specific time or options, and always read back the confirmed date and time. This instruction discipline is what keeps the conversation aligned with the data schema.

How can the agent’s voice and personality be changed without rewriting the workflow?

Voice configuration can be adjusted in the real-time API playground by selecting different voices (such as “Sage” or “Ash”). The workflow remains the same—tool use, slot checking, confirmation, and structured extraction—while the spoken persona and delivery style change.

Review Questions

  1. What tool/function calls are triggered when a caller requests a specific time versus when they ask for available slots?
  2. Describe the end-to-end pipeline from recorded audio to a saved appointment in the database.
  3. Which extracted fields are required to fill the appointment schema, and how are they obtained from the transcript?

Key Points

  1. 1

    The appointment workflow relies on real-time voice plus function calling to query a scheduling database for exact date/slot availability.

  2. 2

    When a requested slot is unavailable, the agent switches to listing open slots and then re-checks availability after the caller chooses a time.

  3. 3

    A practical pipeline records the call, transcribes it with Whisper, and uses structured outputs to extract appointment fields for database storage.

  4. 4

    A detailed system message enforces consistent behavior: correct tool selection, required data collection, and read-back confirmation of date and time.

  5. 5

    Voice and personality can be swapped via real-time API voice configuration while keeping the same booking logic.

  6. 6

    Special requirements and some extracted details may still be imperfect, so additional testing is needed before production use.

Highlights

The agent books appointments by checking database-backed slot availability in response to exact spoken times (e.g., “9 a.m. tomorrow”).
A recorded-call pipeline—MP3 recording, Whisper transcription, and structured output extraction—turns conversation into database-ready appointment records.
The same scheduling workflow persists while the receptionist’s voice/personality changes by selecting different voices in the real-time API playground.
The system’s reliability hinges on a tightly specified system message that dictates tool use and confirmation steps.

Topics

Mentioned

  • MP3
  • DB
  • AI