Vercel Finally Caught Up
Based on Theo - t3․gg's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Active CPU pricing shifts billing from wall-clock duration to CPU time actually used, targeting AI streaming workloads where requests wait on external services.
Briefing
Vercel’s latest release package is a direct attempt to close the cost and reliability gap for long-running, low-CPU workloads—especially AI inference and agent-style tasks—while also adding enterprise-grade security and developer ergonomics. The headline change is “active CPU pricing,” which shifts billing away from wall-clock time toward the moments when code is actually using CPU. That matters because many AI workloads spend most of their lifetime waiting on external services (like OpenAI streaming tokens), meaning traditional “per duration” pricing can punish teams even when their compute utilization is tiny.
The transcript lays out why this pricing mismatch happens. In classic serverless, each request gets an isolated instance that stays alive for the full duration of the request, so billing tracks how long the instance runs—even if the CPU is mostly idle while waiting for I/O. For short web requests (database fetches, rendering), duration and CPU usage correlate well. For AI streaming, they don’t: a request can run for tens of seconds while using only nanoseconds or microbursts of CPU to handle events, verify inputs, and forward streamed output. Under that model, teams can end up paying for long runtimes that don’t translate into meaningful CPU work.
Active CPU pricing is presented as Vercel’s answer to Cloudflare Workers’ “net CPU” approach, where billing tracks CPU time rather than elapsed time. The transcript compares workload shapes using two axes—duration and CPU intensity—and argues that AI endpoints land in a “high duration, low CPU” corner that historically didn’t get much optimization. Vercel’s new model is designed for “I/O-bound backends” that scale instantly but remain idle between operations, such as AI inference agents, MCP servers, and other workflows that don’t fit quick request/response patterns.
Beyond pricing, the release adds sandboxing for untrusted code via “Vercel Sandbox,” positioned as an SDK-powered, ephemeral execution environment for code generated by AI agents or submitted by users. There’s also “Q’s,” a limited-beta queue/message system meant to offload long-running background work so users don’t have to wait for slow operations inside a request. For app security, “Bot ID” introduces invisible bot filtering for critical routes (login, signup, checkouts, and expensive API actions), with a basic mode that’s free across plans and a “deep analysis” mode for stronger detection.
Finally, Vercel ships an “AI gateway” beta: a single endpoint to access multiple model providers (including OpenAI, XAI, Anthropic, and Google) with improved routing, observability, and fallback behavior. The transcript frames this as a way to sidestep provider-specific pain—rate limits, reliability issues, and negotiation friction—by routing requests to whichever backend performs best.
Taken together, the changes aim to make Vercel more competitive for AI-heavy production systems: cheaper for long-running inference, safer for untrusted code and automation, and more resilient when model providers struggle.
Cornell Notes
Vercel’s biggest shift is “active CPU pricing,” moving billing toward CPU time actually used rather than wall-clock duration. That targets a key mismatch in AI workloads: long-running requests (often streaming tokens) can spend most of their lifetime waiting on external services while using very little CPU. The transcript argues this pricing model is especially important for “high duration, low CPU” workloads like AI inference agents and similar I/O-bound backends.
Alongside pricing, Vercel adds sandboxing for untrusted code (via an SDK), a limited-beta queue system (“Q’s”) for background tasks, and “Bot ID” for invisible bot filtering on critical routes. It also introduces an “AI gateway” beta to unify access to multiple model providers with routing, observability, and fallback to improve reliability and reduce rate-limit headaches.
Why does wall-clock billing become a problem for AI streaming workloads?
How does active CPU pricing change the cost equation?
What workload shapes benefit most from the new model?
What is Vercel Sandbox meant to solve, and how is it positioned?
How do Q’s and Bot ID address different production pain points?
What problem does the AI gateway aim to fix across model providers?
Review Questions
- Active CPU pricing is designed for which category of workloads, and what two metrics define that category?
- How do Q’s and Bot ID differ in what they protect: user wait time vs. abuse prevention?
- What kinds of provider-specific failures does the AI gateway try to mitigate, and why does fallback matter?
Key Points
- 1
Active CPU pricing shifts billing from wall-clock duration to CPU time actually used, targeting AI streaming workloads where requests wait on external services.
- 2
The cost mismatch is worst for “high duration, low CPU” endpoints—common in inference and agent-style systems that stream tokens.
- 3
Vercel Sandbox provides an SDK-based, isolated environment for running untrusted or AI-generated code safely and ephemerally.
- 4
Q’s (limited beta) enables background processing via queues so slow operations don’t block user requests and can be retried reliably.
- 5
Bot ID adds invisible bot filtering for critical routes, with a free basic mode and a stronger deep-analysis mode for higher-risk actions.
- 6
The AI gateway beta unifies access to multiple model providers with routing, fallback, and per-model observability to reduce rate-limit and reliability pain.
- 7
The release package collectively aims to make Vercel more competitive for production AI: cheaper inference, safer execution, and more resilient model access.