OpenAI’s new API is 200x more expensive than competition

TL;DR

OpenAI’s 01 Pro API pricing is positioned as dramatically higher than competitors ($150 per million input tokens and 600 output tokens), creating a major value problem.

Briefing Cornell Notes

Briefing

OpenAI’s newly launched 01 Pro API pricing lands at $150 per million input tokens and 600 output tokens—an order-of-magnitude jump that makes it dramatically more expensive than competing reasoning models. The core issue isn’t just sticker shock; it’s that the “reasoning” work behind the scenes consumes extra tokens, so users pay for compute that often doesn’t translate into better results or better ergonomics.

The transcript breaks down how token-based billing turns “thinking” into cost. Tokens are the small chunks models process to predict the next token in a sequence, and more tokens generally means more GPU work. With reasoning-enabled models, the model can generate additional intermediate tokens before producing the final answer. OpenAI doesn’t expose those thinking tokens in the API, but they still exist—so enabling higher-effort modes can inflate both input and output token counts even when the final response length looks similar.

That dynamic helps explain why cheaper models can feel faster and more cost-effective. The speaker contrasts OpenAI’s pricing with Google’s Gemini 2.0 Flash, which undercuts older expectations by combining competitive quality with speed and low cost—attributed to Google’s vertical integration (its own processors, architecture, training data pipeline, and infrastructure). The transcript also points to competitive pressure from models like DeepSeek’s reasoning releases, which have repeatedly forced price adjustments across the market.

Within OpenAI’s own lineup, the pricing story becomes more contentious. The speaker says 03 Mini High delivers better real-world outcomes than 01 Pro while costing far less, and that 01 Pro’s experience is not only expensive but also unreliable in practice—failing requests, awkward UI behavior, and even incorrect answers on a known Advent of Code-style test. Even when 01 Pro eventually returns something, the transcript frames it as a poor trade: higher cost, slower or brittle behavior, and results that don’t justify the premium.

A key quantitative claim anchors the argument: 01 Pro is about 136x more expensive than 03 mini high for input tokens and roughly the same multiple for output tokens (600 vs 440). The speaker interprets this as a pricing outlier—possibly a “feature parody” move to ship something tied to internal work rather than a model optimized for value. They also argue that OpenAI’s margins on heavy usage models can be thin or negative, citing the earlier example of OpenAI’s 01 Pro being offered on a $200/month ChatGPT tier that allegedly cost money due to heavy usage.

The transcript ends by reframing the market direction: while OpenAI’s API pricing jumps upward, other providers keep pushing down costs and improving efficiency. The speaker expresses confusion that OpenAI would release a model that is both more expensive and, in their testing, worse than 03 Mini High—then suggests that the “high” tiers may be better understood as an allowed “effort” budget rather than a fundamentally different model, with UI limitations forcing separate naming on OpenAI’s side.

Cornell Notes

OpenAI’s 01 Pro API is priced at $150 per million input tokens and 600 output tokens, making it dramatically more expensive than competing options. The transcript ties the cost gap to token billing and reasoning: models that “think” generate extra intermediate tokens before producing the final answer, and those extra tokens still cost money even if the API doesn’t show them. In the speaker’s tests, 03 mini high delivers better or comparable outcomes at far lower cost, with claimed multiples around 136x cheaper for both input and output. The speaker also argues that OpenAI’s 01 Pro experience can be brittle (request failures and awkward UI behavior), further weakening the value proposition. Overall, the pricing move looks out of step with industry trends toward cheaper, faster inference.

Why does reasoning increase API cost even when the final answer length seems similar?

Billing is token-based. Tokens are the small chunks a model processes to predict the next token. When reasoning is enabled, the model generates additional intermediate tokens before it outputs the final response. Even if OpenAI doesn’t display those “thinking” tokens in the API, they still consume compute and therefore show up as higher input/output token usage and higher cost.

What makes Gemini 2.0 Flash cheaper in the transcript’s explanation?

The transcript attributes Gemini’s low price to Google’s efficiency advantages: it built its own processors and architecture, has the data pipeline needed for training, and controls the infrastructure for hosting and serving the model. That vertical integration reduces cost per token relative to competitors that lack the same end-to-end stack.

How does the transcript quantify the gap between 01 Pro and 03 mini high?

It claims 01 Pro is about 136x more expensive than 03 mini high for input tokens ($150 per million vs a much lower 03 mini high input price) and similarly about 136x for output tokens (600 vs 440). The speaker frames this as an extreme mismatch given that 01 Pro isn’t delivering proportionally better results.

What practical problems does the transcript associate with 01 Pro?

Beyond cost, the transcript describes request failures and UI/interaction issues: the model can take a long time, fail when switching contexts (especially on mobile), and sometimes produces incorrect answers on a known test. The speaker also complains about missing copy/edit affordances and confusing chat behavior, which makes the expensive model harder to use effectively.

What does “high” mean for 03 mini high in the transcript?

The transcript clarifies that “low/medium/high” isn’t a different base model; it’s an allowed effort budget—how much token generation/time the model can spend before answering. Higher settings permit more intermediate generation (more reasoning tokens), which increases cost and can improve answer likelihood, but the transcript argues the extra work isn’t always useful to the user.

Why does the transcript suggest OpenAI’s pricing move may be out of step with the market?

The speaker argues the broader industry trend has been toward cheaper inference and better value, driven by efficiency gains and competitive pressure (e.g., DeepSeek and Gemini). Against that backdrop, 01 Pro’s extreme token pricing looks like a release that prioritizes shipping internal work over delivering value, especially when cheaper models like 03 mini high perform better in their tests.

Review Questions

How do token-based billing and hidden reasoning tokens combine to make “high effort” modes more expensive?
What specific comparisons does the transcript use to argue that 03 mini high is better value than 01 Pro?
According to the transcript, what does the “high” tier actually control, and why does that matter for cost?

Key Points

1
OpenAI’s 01 Pro API pricing is positioned as dramatically higher than competitors ($150 per million input tokens and 600 output tokens), creating a major value problem.
2
Token billing means reasoning modes can cost more because they generate extra intermediate tokens before the final answer.
3
OpenAI’s API doesn’t expose thinking tokens, but the transcript argues those tokens still exist and drive cost.
4
Gemini 2.0 Flash is described as cheaper due to Google’s vertical integration—processors, architecture, training pipeline, and infrastructure.
5
In the transcript’s testing, 03 mini high delivers better or comparable results at far lower cost than 01 Pro, with claimed ~136x multiples.
6
The transcript links 01 Pro’s poor value not only to price but also to reliability and UI/interaction friction (including request failures).
7
The transcript frames the broader market direction as cost-down and efficiency-up, making OpenAI’s pricing jump feel misaligned.

Highlights

01 Pro’s API pricing is framed as an extreme outlier: about 136x more expensive than 03 mini high for both input and output token costs (per the transcript’s calculations).

Reasoning increases token generation, and hidden intermediate tokens can inflate cost even when the final response length looks similar.

Gemini 2.0 Flash’s low price is attributed to Google’s end-to-end control of chips, architecture, training, and serving infrastructure.

The transcript portrays 01 Pro as not just expensive but also brittle in real use, with failures and awkward UI behavior undermining the premium.

Topics

Mentioned

Theo - t3․gg
API
GPU
CDN
DOS
LLM
UI