ChatGPT 4.5 explained: OpenAI product strategy, model pricing and feature breakdown

TL;DR

GPT-4.5 is described as significantly more expensive to serve than Claude 3.7 Sonnet, with cited output around $150 per million tokens and input around $75 per million tokens.

Briefing Cornell Notes

Briefing

OpenAI’s GPT-4.5 rollout is being tightly gated and priced for compute-heavy capability: output runs at about $150 per million tokens and input at about $75 per million tokens—roughly 10x and 25x higher than Claude 3.7 Sonnet’s cited $115 output and $3 input per million tokens. That cost gap isn’t just a billing detail; it signals that GPT-4.5 is expensive to serve and likely requires substantial new infrastructure. The immediate consequence is product strategy: access is limited to Pro Plan users first, with Plus users expected to get it later, alongside plans to add “tens of thousands of GPUs” to handle demand.

The confusion around GPT-4.5 stems from how people are judging it. Early comparisons risk treating GPT-4.5 as a standalone benchmark winner, when the framing here is different: GPT-4.5 is positioned as a “last Lego block” for building a more sticky, end-to-end ChatGPT experience over time. The model’s value is described less in terms of headline benchmark scores and more in real-world interaction qualities—emotional intelligence, nuanced writing style, and the ability to surprise users. Those traits may not show up cleanly in standard eval suites, but they matter for retention because they shape how users experience the system day to day.

This strategy also reflects competitive positioning. ChatGPT is portrayed as the market leader with a massive existing user base, so it can’t afford to be narrowly specialized. Claude, by contrast, is framed as a challenger that benefits from specialization—particularly code—so it can “reason” selectively depending on what users need. The implication is that OpenAI’s path is to cover all bases with a general model that still delivers distinctive interaction behaviors, while also lowering long-term compute costs through hybridization with other models already in the lineup.

GPT-4.5 is therefore treated as a complex primitive—compute-intensive capabilities that can later be combined and optimized. The long-term bet is that OpenAI can bring down inference costs while preserving the higher-level experience traits, potentially leading to a future “GPT-5 by Q2” with emotional intelligence plus the reasoning and other components. The transcript also argues that current evaluation methods are insufficient for capturing these user-facing qualities, pointing to the need for better real-world assessments.

A parallel example is Claude 3.7 Sonnet, described as more opinionated about code structure. That kind of behavioral “scaffolding” can speed users up, even if it doesn’t register as a win in traditional evals. The takeaway: both GPT-4.5 and Claude 3.7 Sonnet are being judged on dimensions that matter to builders and users—how the models behave in practice—not just what they score on paper. For now, GPT-4.5 is available only to Pro Plan users, with Plus access coming next week.

Cornell Notes

GPT-4.5 is priced and gated in a way that signals heavy compute requirements: about $150 per million tokens for output and $75 per million for input, far higher than the cited Claude 3.7 Sonnet input/output costs. Instead of treating GPT-4.5 as a single benchmark milestone, it’s framed as a “Lego block” for ChatGPT’s long-term product stickiness—especially through user-facing behaviors like emotional intelligence, nuanced writing, and the ability to surprise. Those qualities may not show up well in standard evals, so real-world conversations and evaluations are emphasized. The strategy also aims to hybridize capabilities with other models to reduce costs over time, potentially feeding into a future GPT-5 direction. Access starts with Pro Plan users, with Plus next week.

Why does GPT-4.5’s pricing matter for how it’s being rolled out?

The transcript ties pricing directly to compute cost and infrastructure needs. GPT-4.5 is described as outputting at about $150 per million tokens and input at about $75 per million tokens, contrasted with Claude 3.7 Sonnet’s cited $115 output and roughly $3 input per million tokens. Because serving GPT-4.5 is expensive, access is limited to Pro Plan users first, with Plus delayed. The plan also mentions adding tens of thousands of GPUs to support the model’s compute demands.

What’s the core reason GPT-4.5 is framed as more than a benchmark upgrade?

Early judging based on what’s released immediately can miss the point. GPT-4.5 is positioned as a foundational capability—an expensive primitive—that improves the overall ChatGPT experience through emotional intelligence, nuanced writing style, and the ability to surprise users. Those traits are described as hard to measure with standard benchmark evals but show up in real-world interactions, which affects user retention and long-term product success.

How does the transcript explain the difference between OpenAI’s and Anthropic’s strategies?

ChatGPT is portrayed as the market leader with a massive existing user base, so it must “cover all the bases” rather than specialize narrowly. Claude is framed as a challenger that benefits from specialization, particularly code-focused behavior. The transcript claims Claude can reason depending on needs, while GPT-4.5 is aimed at broad competence plus distinctive interaction qualities.

What does “hybridization” mean in this context?

Hybridization refers to combining GPT-4.5’s valuable capabilities with other models already in the lineup to reduce compute costs while preserving user-facing strengths. The transcript suggests that once the complex primitives are established, OpenAI can mix them in ways that lower inference expense. This is presented as a pathway toward a future GPT-5 direction with emotional intelligence and reasoning components.

Why are standard evals portrayed as insufficient?

The transcript argues that benchmark-style evaluations often fail to capture user-relevant behaviors such as emotional intelligence, nuanced tone, and surprise. It also cites Claude 3.7 Sonnet’s “opinionated” code scaffolding as an example of a behavior that can help users build faster even if it doesn’t register as a clear eval win. Better evaluation methods are implied as necessary to measure these practical effects.

Review Questions

What pricing differences between GPT-4.5 and Claude 3.7 Sonnet are cited, and how do those differences connect to rollout timing?
Which GPT-4.5 qualities are described as most important for long-term user stickiness, and why might benchmarks miss them?
How does the transcript use Claude 3.7 Sonnet’s code “scaffolding” to argue for better evaluation methods?

Key Points

1
GPT-4.5 is described as significantly more expensive to serve than Claude 3.7 Sonnet, with cited output around $150 per million tokens and input around $75 per million tokens.
2
Access to GPT-4.5 is limited to Pro Plan users first, with Plus expected next week, reflecting compute and infrastructure constraints.
3
GPT-4.5 is positioned as a foundational “Lego block” for long-term ChatGPT stickiness, emphasizing emotional intelligence, nuanced writing, and the ability to surprise.
4
Standard benchmarks are portrayed as inadequate for measuring user-facing interaction qualities that matter in real conversations.
5
OpenAI’s strategy is framed as broad coverage for a market leader, while Claude is framed as benefiting from specialization—especially in code.
6
Hybridizing GPT-4.5 capabilities with other models is presented as a path to lower compute costs while preserving key experience traits.
7
Claude 3.7 Sonnet is used as an example of opinionated code structure that can speed users up even if it doesn’t show up in evals.

Highlights

GPT-4.5’s cited token pricing (about $150 output / $75 input per million tokens) is framed as a compute-cost signal behind Pro-only access.

Emotional intelligence, nuanced writing, and “surprise” are treated as long-term retention drivers that benchmarks may miss.

GPT-4.5 is described as a “last Lego block” meant to enable a more compelling, sticky ChatGPT experience through future hybridization.

Claude 3.7 Sonnet’s more opinionated code scaffolding is presented as a practical advantage that traditional evals may not capture.

Topics

GPT-4.5 Pricing
Model Rollout
Compute Costs
Model Strategy
Evaluation Methods