Why Does OpenAI Need a 'Stargate' Supercomputer? Ft. Perplexity CEO Aravind Srinivas

TL;DR

Stargate is portrayed as a compute-scale investment whose continuation depends on OpenAI delivering measurable capability improvements tied to large compute jumps.

Briefing Cornell Notes

Briefing

OpenAI’s planned “Stargate” supercomputer is framed as a compute arms race and an AGI accelerant: Microsoft’s willingness to fund a massive new cluster hinges on OpenAI delivering meaningful capability jumps, and those jumps are expected to track with large, near-term increases in training and inference power. The centerpiece claim is that Stargate would deliver at least a 100x jump in compute—“orders of magnitude” beyond what Microsoft currently supplies—while landing around 2028, with earlier stages coming online sooner. More compute, in this view, translates directly into stronger “frontier” models, which then become the substrate for whatever comes next in the 1–4 year AI timeline.

The argument starts with a conditional: Stargate moves forward only if OpenAI can improve its models enough to justify the investment. That improvement is tied to expected model milestones—GPT 4.5 “in the spring,” GPT 5 “at the end of this year or the beginning of next,” and then even later generations. A key supporting thread is a claim that such a project is “absolutely required” for artificial general intelligence, defined less as a single benchmark and more as the kind of system people would feel comfortable hiring for most jobs.

Skepticism about AGI timelines is met with a practical counterpoint: if AGI were truly imminent, hiring at the current pace would be unnecessary. Perplexity CEO Aravind Srinivas is used to press that question—why keep scaling teams and operations if a near-term AGI breakthrough is already within reach. The transcript also emphasizes the non-glamorous reality of running frontier systems: clusters must be maintained, GPUs selected, failures handled, and production code debugged—tasks that still require human operators, even if models become more capable.

The compute story is paired with an energy story. A “mathematical discrepancy” is flagged: Stargate’s compute gains are described as enormous, yet the power draw is said to be comparable to running several large data centers today. The response leans on a semiconductor trend from TSMC: energy-efficient performance is projected to improve roughly 3x every two years, implying chips could be nearly 10x more energy efficient by 2028. That efficiency curve is presented as the bridge between “100x compute” and “manageable watts.”

Beyond raw competition with Google, the transcript lays out multiple reasons for Stargate. One is capacity parity: Google is portrayed as having more near-term compute and more AI server chips than OpenAI, with Microsoft leadership describing the strategic advantage as compute, data, and IP rather than personnel alone. Another reason is to train larger future model families (the transcript name-checks GPT 7, GPT 7.5, and GPT 8 as targets for later training cycles). A third reason is “long inference”—letting models think longer via chain-of-thought search, framed as a way to boost reliability and unlock capabilities that show up in demos where responses arrive after sustained reasoning.

Finally, Stargate is positioned as a multimodal platform. The transcript points to OpenAI’s voice system (described as able to imitate a voice from about 15 seconds of audio) and to text-to-video generation exemplified by Sora, arguing that more compute supports richer audio/video/robotics capabilities—along with the risks of high-fidelity impersonation and deepfake-like content. The overall takeaway is that Stargate is less about a single product like Sora or a voice feature and more about manufacturing intelligence at a scale that could reshape what AI can do across tasks, timelines, and modalities.

Cornell Notes

Stargate is presented as a compute-scale project meant to keep OpenAI competitive and accelerate the path toward general-purpose, high-capability AI. The core claim is that the planned supercomputer would deliver at least a 100x jump in compute versus current supplies, with a target launch around 2028, while power demands remain comparable to several major data centers thanks to expected chip efficiency gains from TSMC. The transcript argues that capability improvements track with compute, and that future model generations (GPT 4.5, GPT 5, and later) require that scale. It also links Stargate to “long inference” (letting models reason longer) and to multimodal systems such as voice and text-to-video, where more compute can improve quality and reliability. The stakes extend beyond performance to operational realities and risks like voice impersonation.

Why is Stargate framed as necessary for OpenAI’s next steps rather than just another hardware upgrade?

The transcript ties Stargate to a conditional: Microsoft’s funding depends on OpenAI producing meaningful capability gains. Those gains are portrayed as compute-dependent—Stargate would provide “orders of magnitude” more compute (at least ~100x) than current supplies, which is expected to translate into stronger frontier models. The plan is also positioned as a response to competitive pressure, especially from Google’s reported near-term compute and chip availability. In short, Stargate is treated as the enabling layer for training bigger future models and sustaining rapid iteration.

How does the transcript reconcile “100x more compute” with energy and power constraints?

It flags a potential mismatch: Stargate’s energy use is said to be similar to running several large data centers today, even though compute would rise dramatically. The proposed explanation relies on semiconductor progress from TSMC: energy-efficient performance is projected to improve about 3x every two years, reaching nearly 10x better efficiency by 2028. That efficiency improvement is used to argue that the same (or similar) power budget can support far more compute by the time Stargate is operational.

What role does “long inference” play in the Stargate rationale?

Long inference is described as letting models think longer before responding—analogous to AlphaGo-style pondering and search over multiple reasoning branches. The transcript’s example from Aravind Srinivas frames reliability improvements as coming from searching a tree of possible reasoning chains and selecting the most probable explanation, rather than decoding a single chain quickly. The expected outcome is demos where answers arrive after longer delays, potentially enabling breakthroughs that feel more “general” even if they don’t immediately replace every job function.

Why does the transcript use hiring and operational staffing as a reality check on AGI timelines?

It raises a skeptical question: if AGI were truly imminent, why would OpenAI be hiring at a rapid rate (the transcript cites dozens of hires per month)? The argument is that hiring thousands over several years would be unnecessary if a near-term AGI breakthrough eliminated the need for large teams. The transcript also emphasizes that running frontier systems requires human labor—maintaining clusters, handling GPU/node failures, and debugging production code—tasks that remain difficult to fully automate in the near term.

What competitive dynamic with Google is highlighted?

The transcript claims Google has more near-term computing capacity and more AI server chips than OpenAI, citing complaints about chip availability and an “Insider” chart from SemiAnalysis. It also quotes Microsoft leadership describing strategic advantages as compute, data, IP, and people—suggesting that supercomputers like Stargate are central to outpacing rivals. Stargate is therefore framed as a capacity-matching move as well as a capability-building one.

How does the transcript connect Stargate to multimodal capabilities and associated risks?

It links more compute to richer multimodal outputs: OpenAI’s voice engine is described as able to imitate a voice from roughly 15 seconds of audio, and Sora is used as an example of photorealistic text-to-video generation. The transcript argues that scaling compute supports both quality improvements and broader modality coverage (audio/video/possibly robotics). It also flags risks such as voice impersonation and deepfake-like misuse, implying that capability gains come with societal and security challenges.

Review Questions

What compute increase does the transcript attribute to Stargate, and how does it argue that power usage can remain roughly comparable?
How does “long inference” differ from faster, single-pass responses, and what reliability mechanism is described?
Which operational and staffing factors does the transcript use to challenge overly optimistic AGI timelines?

Key Points

1
Stargate is portrayed as a compute-scale investment whose continuation depends on OpenAI delivering measurable capability improvements tied to large compute jumps.
2
The plan is described as delivering at least ~100x more compute than current supplies, with earlier stages coming online before a likely 2028 launch.
3
Energy constraints are addressed by projecting major chip efficiency gains from TSMC by 2028, helping reconcile higher compute with similar power budgets.
4
Competitive pressure—especially from Google’s reported compute and chip availability—is presented as a key driver for Microsoft’s involvement.
5
The transcript links Stargate to both training larger future model generations and improving “long inference,” where models reason longer via chain-of-thought search.
6
Operational realities (cluster maintenance, GPU/node failures, production debugging) are used to argue that human teams remain necessary even as models improve.
7
Multimodal scaling is emphasized through voice imitation and text-to-video, alongside risks like high-fidelity impersonation.

Highlights

Stargate is framed as at least a 100x compute leap—“orders of magnitude” beyond current supplies—positioned as the lever for frontier model capability gains.

A TSMC efficiency projection (about 3x every two years) is used to explain how far more compute could fit within similar power constraints by 2028.

Long inference is described as a reliability upgrade: search over multiple reasoning branches before producing an answer.

Voice imitation is highlighted as a capability with real misuse potential, with the transcript describing high-fidelity cloning from about 15 seconds of audio.

The hiring-rate argument is used as a check on AGI timelines: rapid scaling of teams would be hard to justify if AGI were truly imminent.

Topics

Stargate Supercomputer
Compute Scaling
AGI Timelines
Long Inference
Multimodal AI
Voice Imitation

Mentioned

Microsoft
OpenAI
Perplexity
Google
TSMC
Sora
Gemini
AlphaGo
AlphaFold
SemiAnalysis
Boston Consulting Group
11 Labs
Sam Altman
Aravind Srinivas
Gome Brown
Andrej Karpathy
Lucas Kaiser
Denis
Sam
AGI
GPT
RL
RHF
GPU
AI
GDP
RLHF
H100
B100
IP