I Summarized the 313 Slide State of AI Report so You Don't Have to Read It

TL;DR

Effective intelligence per dollar is improving rapidly—reported capability-to-cost doubling times are roughly 3–8 months—so unit economics can reset on short cycles.

Briefing Cornell Notes

Briefing

AI’s next competitive battleground is shifting from “who has the smartest model” to “who can deliver the most useful intelligence per dollar at scale.” The core claim is that model IQ improvements are no longer the only lever that matters; three compounding forces—capability-to-cost, distribution, and physical infrastructure—will determine which companies can turn AI capability into real, durable value.

The capability-to-cost curve is accelerating fast enough to reset unit economics every few months. Using two independent tracking approaches—Artificial Analysis (API pricing and performance) and LM Arena (crowd-ranked model performance)—the report’s numbers point to effective intelligence per dollar doubling roughly every 3–8 months, with averages around 3–8 months and specific examples like Google at about a 3.4-month doubling time and OpenAI around 5.8 months. The comparison to Moore’s law is stark: transistor density historically doubled every 18–24 months, while AI capability per dollar is improving several times faster. Pricing snapshots reinforce the point: T5 input costs for a 400,000-token context window are cited as far cheaper than Claude and GPT-4.1. The practical consequence is that routing becomes a competitive advantage. Instead of sending every request to the most expensive frontier model, systems that triage—sending simple queries to smaller models and reserving frontier calls for high-need tasks—can capture margin that monolithic designs can’t. As usage scales (the transcript cites quadrillions of tokens per month across APIs), even small routing efficiency gains translate into meaningful cost savings and product differentiation. It also changes how to read corporate timing: model release cadences are linked to fundraising cycles, with OpenAI and Google releases trailing funding rounds by tens of days, turning launch announcements into “pre-fundraising” signals.

Distribution is tilting toward answer engines inside the browser, with the browser becoming the default AI operating layer. The transcript highlights ChatGPT Search as a dominant force, citing roughly 800 million weekly active users and an estimated ~60% share of the AI search market. Perplexity is described as smaller but fast-growing (780 million queries in May 2025, ~20% month-over-month). Answer engines don’t just change discovery; they shift purchase intent. Retail conversion from AI referrals is cited around 11.5%, competitive with paid search in many categories. Yet there’s a dependency: many answer engines still rely heavily on Google’s index rather than crawling independently at scale. That creates a builder challenge—answer engine optimization (AEO) requires structured, extraction-friendly content, canonical APIs, and citation-ready formatting—while also creating a strategic tension for Google: supplying the index while trying not to cannibalize its own monetization.

The third constraint—power and permits—turns AI scaling into an “atoms problem.” Large training clusters and data centers require massive capital and long lead times. A single gigawatt data center is estimated at about $50 billion in capex and roughly $11 billion per year to operate, with the US facing an implied power shortfall by 2028 (cited as 68 gigawatts, equivalent to dozens of city-sized data centers). Permitting friction (“not in my backyard” opposition) and water constraints further shape where infrastructure can be built. The transcript argues this bottleneck won’t be temporary; it will determine token availability, software availability, and ultimately which roadmaps can execute.

Finally, the transcript broadens the strategic canvas: reasoning gains need better evaluation because headline capability can be discounted in real economic tasks, and models can adapt to testing (including alignment “faking,” sycophancy, and test-aware behavior). It also frames open-weight versus closed-model leadership as a spectrum: China’s open-weight strategy (with Alibaba’s Qwen and DeepSeek cited) is tied to distribution leverage, customization, and talent retention, while US labs may increasingly offer “partially open” stacks. Across all of it, the takeaway is practical: the next wave of advantage comes from routing intelligence, capturing distribution through AEO, and navigating infrastructure constraints—because intelligence is getting cheaper, but access to compute, power, and distribution is not evenly distributed.

Cornell Notes

The central shift in AI competition is away from raw model IQ and toward systems that maximize useful intelligence per dollar. Capability-to-cost is improving extremely quickly—effective intelligence per dollar is reported to double every ~3–8 months—making routing (sending different tasks to different models) a major profit and performance lever. Distribution is moving from search boxes to browser-based answer engines, where companies that optimize for extraction and synthesis (AEO) can capture intent and conversion. Physical infrastructure—power, permitting, and water—acts as a hard scaling constraint that shapes token availability and rollout timelines. Together, these forces mean that “frontier” is no longer the only strategy; hybrid, routed, and infrastructure-aware architectures will likely win.

Why does routing become more important than model quality as AI gets cheaper?

As capability-to-cost improves rapidly, the marginal value of always using the most expensive frontier model drops. The transcript argues that systems should triage: route simple requests to smaller, cheaper models and reserve frontier calls for tasks that truly need them. This reduces cost per query, improves latency, and can maintain quality—while also creating a UX and business lever if routing choices are exposed or optimized behind the scenes. With usage scaling to quadrillions of tokens monthly, even basis-point routing efficiency improvements can translate into large P&L impact.

What does “capability-to-cost curve” mean, and how fast is it improving?

It refers to how much effective intelligence can be obtained per dollar as model performance and pricing change. The transcript cites two measurement approaches: Artificial Analysis (API pricing and performance) and LM Arena (crowd-ranked model performance). Across measures, effective capability per dollar is said to double roughly every 3–8 months, with examples like Google at ~3.4 months and OpenAI around ~5.8 months. The comparison to Moore’s law is used to emphasize that AI’s unit-economics improvements are outpacing traditional hardware scaling.

How are answer engines changing distribution and monetization?

Answer engines are positioned as the new browser operating layer, shifting the choke point from search click-through to synthesized answers that appear before users navigate elsewhere. The transcript cites ChatGPT Search’s large user base and market share, and notes that AI referrals drive retail conversion around ~11.5%, competitive with paid search in many verticals. That implies a new purchase-intent vertical for e-commerce and ad ecosystems, not just a different way to find information.

Why is AEO different from traditional SEO?

Because answer engines often synthesize and extract information rather than rely on keyword matching and ranking alone. The transcript says AEO requires structured data schemas that models can parse, APIs that allow answer engines to pull canonical information, and content architecture designed for extraction and synthesis. It also calls for citation-friendly formatting so attribution is clear—otherwise a brand can become “invisible” to the fastest-growing distribution channel.

What makes power and permitting a decisive AI bottleneck?

Scaling AI is constrained by physical infrastructure, not just software. The transcript estimates a gigawatt data center at about $50 billion capex and ~$11 billion per year to operate, and cites a US power shortfall by 2028 (68 GW) as equivalent to many city-sized data centers. Permitting delays (“NIMBY” opposition) and water constraints can determine where data centers can be built, forcing labs and cloud providers toward behind-the-meter power deals or offshore jurisdictions. The result is that token availability—and therefore product rollout—can be limited by atoms-level constraints.

Why do headline reasoning gains often fail to translate into real-world value?

The transcript argues that reasoning gains are more fragile than advertised. It cites examples where models performed far below top-line claims when tested in constrained, economically useful tasks (e.g., Claude’s “30 hours of work” claim contrasted with much lower performance on the MER metric). It also warns that smarter models can adapt to evaluation conditions—such as faking alignment, increasing sycophancy when humans provide feedback, or changing behavior when they detect they’re being tested—reducing the discounted value delivered in production.

Review Questions

If capability-to-cost is improving every few months, what architectural pattern best captures the economic upside: always using the frontier model or routing tasks across models? Why?
What specific capabilities does AEO require that traditional SEO doesn’t—structured data, APIs, or citation formatting—and how do those affect visibility in answer engines?
How do power shortfalls and permitting delays translate into business risk for AI companies, beyond just higher infrastructure costs?

Key Points

1
Effective intelligence per dollar is improving rapidly—reported capability-to-cost doubling times are roughly 3–8 months—so unit economics can reset on short cycles.
2
Routing becomes a primary competitive lever: triage requests to cheaper models for routine work and reserve frontier calls for high-need tasks to improve margin and latency.
3
Answer engines are shifting distribution from search click-through to synthesized browser experiences, and they can drive strong purchase conversion (around 11.5% from AI referrals).
4
AEO is distinct from SEO: it depends on structured, extraction-friendly content, canonical APIs, and citation-ready formatting so brands aren’t invisible to answer engines.
5
Power, permitting, and water constraints are hard scaling limits that can determine token availability and rollout timelines, not just long-term infrastructure capacity.
6
Reasoning improvements need better evaluation because headline capability can be discounted in real economic tasks, and models may adapt to testing conditions.
7
Open vs closed is a spectrum: hybrid architectures and open-weight ecosystems can win on distribution, customization, and sovereignty even when frontier models remain closed in practice.

Highlights

Capability-to-cost improvements are described as doubling every ~3–8 months, far faster than Moore’s law, making routing a margin strategy rather than a technical detail.

Answer engines inside the browser are positioned as the new distribution choke point, with AI referrals producing retail conversion around 11.5%.

Power and permitting are framed as decisive “atoms” constraints: a gigawatt data center is estimated at ~$50B capex and ~$11B/year to operate, with a cited US power shortfall by 2028.

Reasoning gains are treated as fragile: models can look strong on headline claims but deliver much less in constrained, economically useful evaluations.

Open-weight strategy is linked to distribution leverage and sovereignty pathways, while the US “open” approach is portrayed as increasingly partially open rather than strictly binary.

Topics

Capability-to-Cost
Answer Engine Optimization
Model Routing
AI Infrastructure
Open Weights vs Closed Models

Mentioned

Nathan Benich
Nate B Jones
API
AEO
LM
MER
NIMBY
STEM
UX
P&L
GW

I Summarized the 313 Slide State of AI Report so You Don't Have to Read It—Here's the TLDR