I was wrong about GPT-5

TL;DR

Auto router is presented as the main reason many users get a less reasoning-heavy GPT-5 configuration by default, especially on free tiers.

Briefing Cornell Notes

Briefing

GPT-5’s public rollout is being blamed for a sharp mismatch between early testing and what many users experience now—especially on ChatGPT.com and in coding tools like Cursor. The core claim is that the “good” GPT-5 behavior Theo - t3․gg saw through API access and specific endpoints has degraded or is not being delivered to most users via the default ChatGPT interface, leaving people with slower, less capable responses and more frequent quality failures.

A major reason offered for the confusion is an “auto router” system that decides whether to use GPT-5 reasoning or a faster “quick answer” mode. According to the account, most users—particularly on free tiers—end up routed to the least capable configuration. That design is meant to balance cost and latency, but it also changes the feel of the model: reasoning takes longer and produces no immediate output, while many ChatGPT users are accustomed to instant responses. The result is a model that can feel “dumber” or less interactive, even when the underlying GPT-5 capability exists elsewhere.

The rollout also allegedly hid or deprecated other GPT-5 variants in the UI, disabling access for free users and limiting options even for paid tiers. The complaint is that simplifying the model picker and naming scheme backfired: the system was supposed to “figure it all out,” but the experience people report doesn’t match the early access experience. Theo - t3․gg says the auto router and interface changes together explain why users who tried GPT-5 after launch often didn’t see the same performance.

Beyond product mechanics, the account argues that tool integration and version clarity have worsened the situation. Cursor and OpenAI reportedly collaborated ahead of time, yet the author claims the current Cursor experience is “nowhere near” the earlier one. The degradation is described as noticeable within days—launch day felt strong, then quality and speed dropped. The author cites repeated tests using the same prompts and code workflows, including image generation and coding tasks, where outputs allegedly lost gradients, introduced rendering glitches, and produced worse UI edits.

The transcript also addresses credibility attacks. Accusations that the author is a paid OpenAI shill are rejected with financial context: the author says they were not paid for the reaction video (though an appearance fee was offered for a different launch video) and claims they are down roughly $25,000 on inference costs with T3 Chat. Still, the author admits a key mistake: publishing the GPT-5 reaction video before monitoring public sentiment, while being away at DEFCON during launch.

Finally, the account broadens the critique. It argues that GPT-5’s behavior requires different prompting and system design than prior models, and that many developers had to build workarounds for other labs’ “agentic” quirks—yet GPT-5’s instruction-following changes weren’t communicated clearly enough. The author concludes that OpenAI botched the launch, that ChatGPT.com’s experience is unacceptable right now, and that the best GPT-5 performance may be reachable mainly through API endpoints and the right configurations until the defaults and routing improve.

Cornell Notes

GPT-5’s early “wow” performance is contrasted with a worse current experience for many users, especially on ChatGPT.com and in Cursor. The transcript attributes the gap largely to an auto-routing system that decides when to enable reasoning versus a fast “quick answer” mode, which can leave most users on a less capable configuration. It also claims OpenAI simplified the model options by hiding/deprecating other variants in the UI, making it harder for users to access the configurations that matched early testing. The author further reports quality regressions over days using repeated prompts and code/image workflows, and argues that tool integration and unclear version/parameter differences contribute to the mismatch. The practical takeaway: the “good” GPT-5 behavior may exist, but the default user path may not be delivering it yet.

What mechanism is blamed for most users not getting the “smart” GPT-5 experience?

Auto router is described as a request classifier that chooses where to send a prompt based on content. If a user asks for deep thinking or a hard math problem, reasoning should be enabled; otherwise it may route to a faster, less reasoning-heavy mode. The transcript claims this means free-tier and typical users often receive the least capable configuration, plus a “quick answer” button that disables reasoning for immediate output—changing both latency and quality.

Why does the transcript claim the default ChatGPT experience feels worse even if GPT-5 capability exists?

Because the reasoning path is slower and produces no immediate text, while many ChatGPT users are accustomed to instant responses. The transcript argues that auto router and quick-answer behavior optimize for speed/cost, so the model’s “feel” shifts from launch-time reasoning strength to faster but less impressive outputs. It also claims that other GPT-5 variants are hidden or disabled in the UI, so users can’t easily select the configuration that matched early testing.

What evidence does the author use to argue quality dropped after launch?

The transcript describes repeated tests with the same prompts and code/image workflows across days, including regenerating an image/UI build in Cursor and comparing it to earlier results. It claims outputs lost correct gradients, introduced rendering glitches, and produced worse UI edits even when the prompt and code were reused. It also cites a prompt run in Cursor chat history that allegedly fell from about a “10 out of 10” to around “five” after the change.

How does the transcript address accusations of being an OpenAI paid shill?

It rejects the claim with financial context: the author says they were not paid for the reaction video and claims they are down about $25,000 on inference costs with T3 Chat over weeks. The author admits an appearance fee was offered for a different launch video, which they declined. They also acknowledge a publication timing mistake—posting before fully monitoring community reaction—while being away at DEFCON during launch.

What does the transcript say about GPT-5 variants and why users may be confused?

It argues that “versions” are largely different parameter settings (e.g., “juice” controlling allowed thinking) rather than entirely separate models. It also mentions a “fast” version that prioritizes traffic, and says the UI/endpoint differences weren’t clear enough at launch. The author claims the endpoint they used earlier felt much better than the configurations they’re seeing now in Cursor.

Why does the transcript argue developers can’t just swap model strings in tools?

It claims GPT-5’s instruction-following and behavior require changes in prompting and system design. The author says tool builders had to implement safeguards with other models (e.g., restricting file access in vibe coding apps), and that GPT-5’s different behavior wasn’t communicated clearly enough. Cursor integration is described as requiring deeper internal adjustments than a simple model-name replacement.

Review Questions

What role does auto router play in determining whether GPT-5 reasoning is used, and how does that affect user-perceived quality?
How does the transcript connect UI changes (hiding/deprecating variants) to the gap between early testing and current user experiences?
What kinds of repeated tests (prompts, workflows, tools) are cited as evidence of quality regression after launch?

Key Points

1
Auto router is presented as the main reason many users get a less reasoning-heavy GPT-5 configuration by default, especially on free tiers.
2
ChatGPT.com experience is claimed to be worse than API/endpoint access because routing and “quick answer” behavior trade off latency for reduced reasoning quality.
3
The rollout is criticized for hiding/deprecating other GPT-5 variants in the UI, limiting users’ ability to reach the configurations that matched early testing.
4
Quality and speed are described as degrading within days, with repeated prompts and workflows allegedly producing worse outputs in Cursor.
5
The transcript admits a credibility-risk mistake: publishing before monitoring public sentiment while away at DEFCON during launch.
6
Tool integration and unclear “version”/parameter differences (e.g., “juice,” fast vs high reasoning) are blamed for making GPT-5 harder to use effectively out of the box.
7
Accusations of paid shilling are rejected with claims of non-payment for the reaction video and significant inference costs borne by the author’s T3 Chat work.

Highlights

The transcript’s central explanation for the mismatch is auto router: it routes most users to faster, less reasoning-heavy behavior unless prompts clearly demand deep reasoning.

A claimed UI simplification—hiding/deprecating other GPT-5 variants—may prevent users from selecting the configurations that produced the earlier “blow my mind” results.

The author describes day-to-day regression in Cursor using the same prompts, including image/UI generation failures like incorrect gradients and rendering glitches.

Credibility attacks are met with financial context: the author says they were not paid for the reaction video and claims large inference losses tied to T3 Chat testing.

Topics

GPT-5 Rollout
Auto Router
Cursor Integration
Model Variants
ChatGPT.com Experience