I was wrong about GPT-5
Based on Theo - t3․gg's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Auto router is presented as the main reason many users get a less reasoning-heavy GPT-5 configuration by default, especially on free tiers.
Briefing
GPT-5’s public rollout is being blamed for a sharp mismatch between early testing and what many users experience now—especially on ChatGPT.com and in coding tools like Cursor. The core claim is that the “good” GPT-5 behavior Theo - t3․gg saw through API access and specific endpoints has degraded or is not being delivered to most users via the default ChatGPT interface, leaving people with slower, less capable responses and more frequent quality failures.
A major reason offered for the confusion is an “auto router” system that decides whether to use GPT-5 reasoning or a faster “quick answer” mode. According to the account, most users—particularly on free tiers—end up routed to the least capable configuration. That design is meant to balance cost and latency, but it also changes the feel of the model: reasoning takes longer and produces no immediate output, while many ChatGPT users are accustomed to instant responses. The result is a model that can feel “dumber” or less interactive, even when the underlying GPT-5 capability exists elsewhere.
The rollout also allegedly hid or deprecated other GPT-5 variants in the UI, disabling access for free users and limiting options even for paid tiers. The complaint is that simplifying the model picker and naming scheme backfired: the system was supposed to “figure it all out,” but the experience people report doesn’t match the early access experience. Theo - t3․gg says the auto router and interface changes together explain why users who tried GPT-5 after launch often didn’t see the same performance.
Beyond product mechanics, the account argues that tool integration and version clarity have worsened the situation. Cursor and OpenAI reportedly collaborated ahead of time, yet the author claims the current Cursor experience is “nowhere near” the earlier one. The degradation is described as noticeable within days—launch day felt strong, then quality and speed dropped. The author cites repeated tests using the same prompts and code workflows, including image generation and coding tasks, where outputs allegedly lost gradients, introduced rendering glitches, and produced worse UI edits.
The transcript also addresses credibility attacks. Accusations that the author is a paid OpenAI shill are rejected with financial context: the author says they were not paid for the reaction video (though an appearance fee was offered for a different launch video) and claims they are down roughly $25,000 on inference costs with T3 Chat. Still, the author admits a key mistake: publishing the GPT-5 reaction video before monitoring public sentiment, while being away at DEFCON during launch.
Finally, the account broadens the critique. It argues that GPT-5’s behavior requires different prompting and system design than prior models, and that many developers had to build workarounds for other labs’ “agentic” quirks—yet GPT-5’s instruction-following changes weren’t communicated clearly enough. The author concludes that OpenAI botched the launch, that ChatGPT.com’s experience is unacceptable right now, and that the best GPT-5 performance may be reachable mainly through API endpoints and the right configurations until the defaults and routing improve.
Cornell Notes
GPT-5’s early “wow” performance is contrasted with a worse current experience for many users, especially on ChatGPT.com and in Cursor. The transcript attributes the gap largely to an auto-routing system that decides when to enable reasoning versus a fast “quick answer” mode, which can leave most users on a less capable configuration. It also claims OpenAI simplified the model options by hiding/deprecating other variants in the UI, making it harder for users to access the configurations that matched early testing. The author further reports quality regressions over days using repeated prompts and code/image workflows, and argues that tool integration and unclear version/parameter differences contribute to the mismatch. The practical takeaway: the “good” GPT-5 behavior may exist, but the default user path may not be delivering it yet.
What mechanism is blamed for most users not getting the “smart” GPT-5 experience?
Why does the transcript claim the default ChatGPT experience feels worse even if GPT-5 capability exists?
What evidence does the author use to argue quality dropped after launch?
How does the transcript address accusations of being an OpenAI paid shill?
What does the transcript say about GPT-5 variants and why users may be confused?
Why does the transcript argue developers can’t just swap model strings in tools?
Review Questions
- What role does auto router play in determining whether GPT-5 reasoning is used, and how does that affect user-perceived quality?
- How does the transcript connect UI changes (hiding/deprecating variants) to the gap between early testing and current user experiences?
- What kinds of repeated tests (prompts, workflows, tools) are cited as evidence of quality regression after launch?
Key Points
- 1
Auto router is presented as the main reason many users get a less reasoning-heavy GPT-5 configuration by default, especially on free tiers.
- 2
ChatGPT.com experience is claimed to be worse than API/endpoint access because routing and “quick answer” behavior trade off latency for reduced reasoning quality.
- 3
The rollout is criticized for hiding/deprecating other GPT-5 variants in the UI, limiting users’ ability to reach the configurations that matched early testing.
- 4
Quality and speed are described as degrading within days, with repeated prompts and workflows allegedly producing worse outputs in Cursor.
- 5
The transcript admits a credibility-risk mistake: publishing before monitoring public sentiment while away at DEFCON during launch.
- 6
Tool integration and unclear “version”/parameter differences (e.g., “juice,” fast vs high reasoning) are blamed for making GPT-5 harder to use effectively out of the box.
- 7
Accusations of paid shilling are rejected with claims of non-payment for the reaction video and significant inference costs borne by the author’s T3 Chat work.