GPT-4.5 shocks the world with its lack of intelligence...

TL;DR

GPT-4.5 is portrayed as extremely expensive, with pricing cited at $75 per million input tokens and $150 per million output tokens, and access limited to $200-per-month Pro users.

Briefing Cornell Notes

Briefing

OpenAI’s GPT-4.5 launch lands as a costly, underwhelming step forward—one pitched mainly around “vibes” and a more natural chat style rather than measurable breakthroughs. The model is described as the most expensive OpenAI system yet, with pricing pegged at $150 per million output tokens (and $75 per million input tokens), and access limited to $200-per-month Pro users. That price tag matters because it raises the bar for what counts as progress: users and benchmarks are expected to show clear gains, not just subjective improvements.

The central claim in the rollout is that GPT-4.5 reduces hallucinations and performs better on a new “Vibes Benchmark” meant to capture creative thinking. In practice, the demo still produces obvious errors and inconsistent knowledge. It reportedly makes silly mistakes, isn’t self-aware, and even misstates basic facts about itself—claiming a training cutoff of October 2023 and failing to demonstrate a coherent understanding of what “GPT-4.5” is. The transcript’s examples underline the gap between marketing and reliability: it can answer a trivia-style question about letters in “Strawberry,” but then gives incorrect results for a follow-up about the number of “L”s in “Laap paloa.”

When the discussion turns to technical performance, the disappointment sharpens. The model is framed as weaker than “deep thinking” alternatives for programming and science tasks, and it performs poorly on the AER polyglot coding benchmark—both in quality and in cost. The comparison is not just “slightly worse,” but “hundreds of times more expensive” than the better-performing option mentioned in the transcript.

The broader market context adds pressure. The transcript points to xAI’s Gro as the current top model in a betting-market sense, with OpenAI still favored by the end of 2025 but with declining odds. That matters because OpenAI is portrayed as moving toward a for-profit structure that depends on sustaining a massive valuation while spending heavily on scaling. The transcript also references ongoing calls from tech leaders to regulate or stop training large models, and it criticizes the launch optics—Sam Altman allegedly sent interns to demo the system rather than appearing personally.

Finally, the transcript argues that the industry may be heading toward a “sigmoid of sorrow” rather than a singularity: impressive tools, but no sudden leap to artificial superintelligence. The most optimistic takeaway is narrower and practical—AI coding assistants are already powerful for real programmers, and the “plateau” is good news for students learning fundamentals. The overall verdict is that GPT-4.5 is a competent chat model, but not a benchmark-shattering advance commensurate with its cost and hype.

Cornell Notes

GPT-4.5 is presented as an expensive, hype-light upgrade that leans on subjective “vibes” and a new creativity-oriented “Vibes Benchmark” rather than clear benchmark dominance. Pricing is described as extremely high—$75 per million input tokens and $150 per million output tokens—with access limited to $200-per-month Pro users. In demos, the model still makes silly factual mistakes, misstates its own training cutoff, and shows weak performance on coding-focused evaluations like the AER polyglot coding benchmark. The transcript frames this as a broader market signal: xAI’s Gro is viewed as leading in betting-market terms, while OpenAI’s odds decline despite heavy investment. The practical conclusion is that AI coding tools help, but they don’t replace the need for real programming skill.

Why does GPT-4.5’s pricing become a central part of the criticism?

The transcript treats cost as the yardstick for progress. GPT-4.5 is described as five times more expensive than Claude at $75 per million input tokens, with a correction to $150 per million output tokens. It’s also limited to $200 per month Pro users. With that level of expense, the expectation is measurable gains on benchmarks and reliability—not just a more natural chat tone.

What is the “Vibes Benchmark,” and how does the demo performance affect its credibility?

OpenAI’s rollout emphasizes “vibes” and a new “Vibes Benchmark” meant to measure creative thinking and more human-like conversation. But the demo described still produces obvious errors and inconsistent knowledge. It’s portrayed as not self-aware and as misunderstanding basic context about itself, including claiming a training cutoff of October 2023. That mismatch undermines the idea that the benchmark translates into dependable capability.

What factual and self-referential mistakes are highlighted?

The transcript gives examples of the model making silly mistakes even in simple tasks. It correctly answers a trivia-style question about letters in “Strawberry,” then is said to give the wrong number of “L”s in “Laap paloa.” It also reportedly doesn’t demonstrate awareness of what GPT-4.5 is and asserts an incorrect training cutoff (October 2023).

How does GPT-4.5 fare on coding benchmarks compared with alternatives?

For programming and science, the transcript says GPT-4.5 doesn’t perform as well as “deep thinking” models like 03. On the AER polyglot coding benchmark, it’s described as not only worse than DeepSeek but also “hundreds of times more expensive.” The combination of lower quality and much higher cost is presented as a double hit.

What market and business pressures are mentioned, and why do they matter?

The transcript claims xAI’s Gro is the best model in the world according to betting-market sentiment, while OpenAI remains favored by the end of 2025 but with declining odds. It links this to OpenAI’s need to maintain a massive valuation while transitioning to for-profit status—meaning it must justify large spending with sustained performance leadership.

What does the transcript suggest about the path toward artificial superintelligence?

Instead of a rapid leap to a singularity, the transcript predicts a “sigmoid of sorrow,” implying gradual improvement without a sudden breakthrough. It also criticizes the launch optics—Sam Altman allegedly didn’t attend personally—and frames GPT-4.5 as evidence that the industry may be plateauing rather than accelerating into post-human capabilities.

Review Questions

What pricing details are cited for GPT-4.5, and how do they shape expectations for benchmark performance?
Which specific demo failures (including self-referential claims) are used to challenge the “vibes” narrative?
How does the transcript compare GPT-4.5’s coding performance and cost against DeepSeek and other alternatives like Gro or deep-thinking models?

Key Points

1
GPT-4.5 is portrayed as extremely expensive, with pricing cited at $75 per million input tokens and $150 per million output tokens, and access limited to $200-per-month Pro users.
2
The launch emphasis on subjective “vibes” and a “Vibes Benchmark” is challenged by reported demo errors and factual inconsistencies.
3
Reported self-referential issues include claiming a training cutoff of October 2023 and lacking coherent awareness of what GPT-4.5 is.
4
Coding performance is framed as weaker than “deep thinking” models and particularly poor on the AER polyglot coding benchmark relative to cost.
5
Market sentiment is said to favor xAI’s Gro at present, while OpenAI’s odds are described as declining despite continued investment.
6
The transcript argues the industry may be moving toward a plateau rather than a singularity, with AI tools improving but not delivering sudden superintelligence.

Highlights

GPT-4.5’s pricing is positioned as the key reason the launch feels like a letdown: $75 per million input tokens and $150 per million output tokens for $200-per-month Pro access.

Even with claims of lower hallucinations and better “vibes,” the model is described as making silly mistakes and misreporting its own training cutoff (October 2023).

On coding evaluations like the AER polyglot coding benchmark, GPT-4.5 is described as worse than DeepSeek while also costing hundreds of times more.

The transcript frames the broader moment as market pressure: xAI’s Gro is cited as leading in betting-market terms, while OpenAI’s odds decline.

The overall conclusion is a “sigmoid of sorrow” scenario—useful AI progress without a rapid leap to artificial superintelligence.

Topics

GPT-4.5 Pricing
Vibes Benchmark
Hallucinations
AI Coding Benchmarks
Model Market Competition

Mentioned

Sam Altman
GPT
AI
03