MCP is the wrong abstraction
Based on Theo - t3․gg's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Tool overload is presented as a major reason agent quality drops: large MCP tool catalogs can distract models and worsen tool selection.
Briefing
Model Context Protocol (MCP) is getting a reality check: flooding AI agents with tool definitions and tool-call syntax often degrades performance, and the industry may be better served by letting models write code against real APIs instead of forcing them to emit brittle “tool call” formats. Cloudflare’s new direction—turning MCP tools into a TypeScript API and then having the LLM generate TypeScript to call those APIs inside a sandbox—aims to cut context bloat, reduce token waste, and make multi-step tool use more reliable.
The core complaint is practical. Many agent systems expose every available action as an MCP tool, and the transcript argues that “more tools” correlates with worse outcomes: agents get distracted, choose poorly, or fail to execute the right sequence. Concrete comparisons across coding assistants illustrate the pattern. A complex agent setup (e.g., Trey’s nested agents and a long list of file operations, command controls, and even third-party tool access) is contrasted with a smaller, more focused tool set (e.g., Codex listing only a handful of functions like shell execution, plan updates, image viewing, and patch application). Other assistants show the same theme: when tool counts are kept tight and search/execution are clearly defined, behavior improves; when tool catalogs balloon, quality drops.
That tool-count problem connects to a deeper mechanism: traditional MCP requires the model to output special tool-call tokens in a format it has mostly seen in synthetic training, not in real-world code. The transcript describes how tool calling works under the hood—LLMs emit structured “tool call” sequences that a harness interprets as JSON-RPC-like requests, then feeds tool results back into the model for another generation. Each tool call becomes its own generation cycle, so intermediate outputs accumulate in the conversation history. The result is context bloat: even if each step starts small, repeated reasoning-plus-tool-result loops can inflate input tokens into the thousands or tens of thousands.
Cloudflare’s proposed workaround reframes the interface. Instead of exposing dozens of MCP tools directly, the agent SDK fetches the MCP server schema and converts it into a TypeScript API with documentation. The model then writes code that calls those APIs, and that code runs in a secure, isolated sandbox (Cloudflare isolates/V8-based). This shifts the “multi-step orchestration” from repeated model generations into a single code execution path, so the agent can read only the final results it needs rather than re-ingesting every intermediate tool call.
The transcript also argues MCP’s value is partly structural rather than semantic: MCP provides a uniform way to discover and connect to APIs with attached documentation and standardized connectivity/authorization. But it’s still treated as a patch—useful when legacy systems can’t be represented as code-first configuration. The speaker’s broader skepticism is that hype cycles may be outpacing real adoption, and that long-term, teams will prefer configurations that live directly in the codebase (Terraform/infra-as-code analogies, plus examples like Convex where configuration is local and file-based).
In short: MCP may remain useful for interoperability, but Cloudflare’s TypeScript-API-plus-sandbox approach targets the biggest pain points—too many tools, brittle tool-call syntax, and runaway context growth—by letting LLMs do what they’re increasingly good at: writing code to orchestrate deterministic actions.
Cornell Notes
MCP often underperforms when agents are given large tool catalogs and when models must emit special tool-call syntax across many separate generation cycles. The transcript argues that tool-call formats are less “natural” to LLMs than real-world code, and that each tool call forces another reasoning step that inflates context and token usage. Cloudflare’s alternative converts MCP server capabilities into a TypeScript API (with docs) and lets the LLM write TypeScript code that calls those APIs inside a secure sandbox. This approach aims to reduce context bloat and improve reliability, especially when multiple calls must be chained. The broader takeaway is that MCP is valuable for standardizing access to external tools, but code-first interfaces may ultimately be more robust for agent workflows.
Why does adding more MCP tools tend to make agents worse in practice?
How does traditional MCP create context bloat and token waste?
What is Cloudflare’s “TypeScript API + code in a sandbox” alternative to direct MCP tool calls?
Why does the transcript claim LLMs are better at calling APIs via code than via MCP tool-call syntax?
What does MCP still do well, even if it’s treated as a patch?
How does the transcript connect MCP to the broader “configuration as code” debate?
Review Questions
- What specific failure modes are attributed to MCP when agents are given many tools, and how do the transcript’s tool-list comparisons support that claim?
- Explain the token/context-bloat mechanism described for traditional MCP tool calling. How does the TypeScript-code approach change the execution flow?
- What are MCP’s main strengths (uniformity/discovery/authorization handling), and why does the transcript still call it a patch rather than an end state?
Key Points
- 1
Tool overload is presented as a major reason agent quality drops: large MCP tool catalogs can distract models and worsen tool selection.
- 2
Traditional MCP tool calling can inflate context because each tool call triggers another model generation cycle with accumulated intermediate results.
- 3
Cloudflare’s approach converts MCP schemas into a TypeScript API and asks the LLM to write code that calls those APIs, reducing repeated reasoning/tool-call loops.
- 4
Running generated code in a secure sandbox is treated as essential to safely “execute eval-like” behavior while limiting network and tool access.
- 5
MCP’s value is framed as interoperability and uniform discovery/documentation rather than an optimal interface for model reasoning.
- 6
The transcript argues long-term wins may come from code-first configuration where system state lives in the repo, not from protocols that patch over external state.
- 7
The biggest reliability gains are expected when models orchestrate deterministic actions through code, not when they emit brittle special tool-call syntax repeatedly.