MCP is the wrong abstraction

TL;DR

Tool overload is presented as a major reason agent quality drops: large MCP tool catalogs can distract models and worsen tool selection.

Briefing Cornell Notes

Briefing

Model Context Protocol (MCP) is getting a reality check: flooding AI agents with tool definitions and tool-call syntax often degrades performance, and the industry may be better served by letting models write code against real APIs instead of forcing them to emit brittle “tool call” formats. Cloudflare’s new direction—turning MCP tools into a TypeScript API and then having the LLM generate TypeScript to call those APIs inside a sandbox—aims to cut context bloat, reduce token waste, and make multi-step tool use more reliable.

The core complaint is practical. Many agent systems expose every available action as an MCP tool, and the transcript argues that “more tools” correlates with worse outcomes: agents get distracted, choose poorly, or fail to execute the right sequence. Concrete comparisons across coding assistants illustrate the pattern. A complex agent setup (e.g., Trey’s nested agents and a long list of file operations, command controls, and even third-party tool access) is contrasted with a smaller, more focused tool set (e.g., Codex listing only a handful of functions like shell execution, plan updates, image viewing, and patch application). Other assistants show the same theme: when tool counts are kept tight and search/execution are clearly defined, behavior improves; when tool catalogs balloon, quality drops.

That tool-count problem connects to a deeper mechanism: traditional MCP requires the model to output special tool-call tokens in a format it has mostly seen in synthetic training, not in real-world code. The transcript describes how tool calling works under the hood—LLMs emit structured “tool call” sequences that a harness interprets as JSON-RPC-like requests, then feeds tool results back into the model for another generation. Each tool call becomes its own generation cycle, so intermediate outputs accumulate in the conversation history. The result is context bloat: even if each step starts small, repeated reasoning-plus-tool-result loops can inflate input tokens into the thousands or tens of thousands.

Cloudflare’s proposed workaround reframes the interface. Instead of exposing dozens of MCP tools directly, the agent SDK fetches the MCP server schema and converts it into a TypeScript API with documentation. The model then writes code that calls those APIs, and that code runs in a secure, isolated sandbox (Cloudflare isolates/V8-based). This shifts the “multi-step orchestration” from repeated model generations into a single code execution path, so the agent can read only the final results it needs rather than re-ingesting every intermediate tool call.

The transcript also argues MCP’s value is partly structural rather than semantic: MCP provides a uniform way to discover and connect to APIs with attached documentation and standardized connectivity/authorization. But it’s still treated as a patch—useful when legacy systems can’t be represented as code-first configuration. The speaker’s broader skepticism is that hype cycles may be outpacing real adoption, and that long-term, teams will prefer configurations that live directly in the codebase (Terraform/infra-as-code analogies, plus examples like Convex where configuration is local and file-based).

In short: MCP may remain useful for interoperability, but Cloudflare’s TypeScript-API-plus-sandbox approach targets the biggest pain points—too many tools, brittle tool-call syntax, and runaway context growth—by letting LLMs do what they’re increasingly good at: writing code to orchestrate deterministic actions.

Cornell Notes

MCP often underperforms when agents are given large tool catalogs and when models must emit special tool-call syntax across many separate generation cycles. The transcript argues that tool-call formats are less “natural” to LLMs than real-world code, and that each tool call forces another reasoning step that inflates context and token usage. Cloudflare’s alternative converts MCP server capabilities into a TypeScript API (with docs) and lets the LLM write TypeScript code that calls those APIs inside a secure sandbox. This approach aims to reduce context bloat and improve reliability, especially when multiple calls must be chained. The broader takeaway is that MCP is valuable for standardizing access to external tools, but code-first interfaces may ultimately be more robust for agent workflows.

Why does adding more MCP tools tend to make agents worse in practice?

The transcript links quality drops to “tool overload.” In examples like Trey, the agent can trigger many nested sub-agents and a long list of operations (search variants, file viewing/editing/renaming/deleting, command execution controls, plus access to external services). The claim is that agents become less reliable when the model must choose among too many actions. By contrast, Codex is shown with only four tools (shell, update plan, view image, apply patch), and it behaves significantly better. The underlying idea is that fewer, clearer tools reduce distraction and improve the model’s ability to pick the right action sequence.

How does traditional MCP create context bloat and token waste?

Tool calling is described as repeated generation cycles: the model outputs a tool-call token sequence, the harness executes the tool, then feeds the tool result back into the model for another generation. Each cycle appends more content to the message history. Even if each step’s initial prompt is small, the accumulated tool outputs and intermediate reasoning can quickly push input tokens into the thousands or tens of thousands. The transcript emphasizes that models don’t “pause”; they generate until they finish, so chaining tools via multiple model runs compounds the context growth.

What is Cloudflare’s “TypeScript API + code in a sandbox” alternative to direct MCP tool calls?

Instead of exposing MCP tools as direct tool-call targets, the agent SDK fetches the MCP schema and converts it into a TypeScript API with documentation. The model is then asked to write TypeScript code that calls those APIs. That code runs in a secure sandbox isolated from the internet, with access only to the TypeScript bindings representing the MCP tools. The key benefit is that the agent can orchestrate multiple steps in one code execution, so it doesn’t need a separate model generation for every tool call.

Why does the transcript claim LLMs are better at calling APIs via code than via MCP tool-call syntax?

The argument is training-data mismatch. LLMs have seen enormous amounts of real-world TypeScript/JavaScript code, but comparatively little “tool call” syntax from synthetic training examples. As a result, forcing the model to emit special tool-call formats can be brittle, especially with complex tool sets. Meanwhile, writing code to call an API is closer to what the model already does well, so it tends to be more reliable.

What does MCP still do well, even if it’s treated as a patch?

MCP’s strongest value is uniformity: it standardizes how an agent discovers and connects to external APIs, including attached documentation and standardized connectivity/authorization handled out of band. That means an agent can use an MCP server without the agent developer and MCP server developer needing to coordinate beforehand. The transcript also notes a catch: authorization is not fully standardized because “O” (authorization) is handled out of band, which complicates portability.

How does the transcript connect MCP to the broader “configuration as code” debate?

The skepticism is that MCP is often used to let AI control legacy systems whose configuration lives outside the codebase (e.g., dashboards, external state). The transcript argues that state outside the repo makes AI worse because it must fetch missing context before acting. The “spicy bet” is that tools requiring MCP-based configuration will struggle long-term versus systems where configuration is represented as files/folders in the codebase (with Convex offered as an example of code-local configuration).

Review Questions

What specific failure modes are attributed to MCP when agents are given many tools, and how do the transcript’s tool-list comparisons support that claim?
Explain the token/context-bloat mechanism described for traditional MCP tool calling. How does the TypeScript-code approach change the execution flow?
What are MCP’s main strengths (uniformity/discovery/authorization handling), and why does the transcript still call it a patch rather than an end state?

Key Points

1
Tool overload is presented as a major reason agent quality drops: large MCP tool catalogs can distract models and worsen tool selection.
2
Traditional MCP tool calling can inflate context because each tool call triggers another model generation cycle with accumulated intermediate results.
3
Cloudflare’s approach converts MCP schemas into a TypeScript API and asks the LLM to write code that calls those APIs, reducing repeated reasoning/tool-call loops.
4
Running generated code in a secure sandbox is treated as essential to safely “execute eval-like” behavior while limiting network and tool access.
5
MCP’s value is framed as interoperability and uniform discovery/documentation rather than an optimal interface for model reasoning.
6
The transcript argues long-term wins may come from code-first configuration where system state lives in the repo, not from protocols that patch over external state.
7
The biggest reliability gains are expected when models orchestrate deterministic actions through code, not when they emit brittle special tool-call syntax repeatedly.

Highlights

Agents degrade when handed too many tools; smaller, well-scoped tool sets correlate with better behavior in the transcript’s comparisons.

Each MCP tool call can force another full model generation, causing context bloat as tool outputs accumulate across steps.

Cloudflare’s key shift: convert MCP tools into a TypeScript API, then let the LLM write TypeScript that runs once in a sandbox to orchestrate multi-step work.

MCP is praised for uniform access and documentation, but criticized as a patch for legacy systems whose configuration lives outside the codebase.

Topics

MCP
Tool Calling
Agent Tooling
Context Bloat
TypeScript APIs

Mentioned

Sunil
Kenton
Simon Willis
Zach
MCP
VS Code
RPC
LLM
JSON
SDK
CI
HTML
V8
API
PR
T3