Everything you need to know about GPT-5 (+ mini and nano)

TL;DR

GPT-5 standard pricing is $1.25 per million input tokens and $10 per million output tokens, with GPT-5 mini at $0.25/$2 and GPT-5 nano at $0.05/$0.40.

Briefing Cornell Notes

Briefing

GPT-5 arrives with a pricing structure and routing system that make it practical to use more often—especially the “mini” and “nano” tiers—while OpenAI’s safety and reliability work targets long-standing pain points like sycophancy, hallucinations, and instruction leakage. The headline takeaway is cost-efficiency at scale: GPT-5’s standard model is priced at $1.25 per million input tokens and $10 per million output tokens, and the smaller variants are dramatically cheaper, with GPT-5 mini at $0.25 per million input and $2 per million output, and GPT-5 nano at $0.05 per million input and $0.40 per million output. That combination, plus token caching discounts, is positioned as a shift from “pick the cheapest model” to “choose the right effort level” without blowing up budgets.

The transcript spends significant time putting those numbers in context. Compared with other popular models—citing examples like o3, o1, Claude 4 Opus, and Sonnet—GPT-5’s token rates are framed as meaningfully lower, and the speaker argues that real-world cost can diverge from sticker price because different models generate different token volumes. Benchmarks like SkateBench are used to illustrate that gap: GPT-5 is described as far cheaper per test run than Grok 4, with the speaker claiming roughly two orders of magnitude difference in average cost. The same logic is applied to why “mini” and “nano” matter: they’re portrayed as strong enough to replace cheaper-but-less-capable options, including common Gemini workflows.

Beyond price, GPT-5’s operational profile is defined by a large context window and high output capacity. The input context is said to land at 400,000 tokens, while output can reach up to 128,000 tokens per request. An official knowledge cutoff is given as September 30, aligning with a simple “recent League of Legends champion” test used to infer cutoff timing.

A major theme is model routing inside a “unified system.” Instead of forcing users to manually swap models, GPT-5 is described as combining a fast general model, a deeper reasoning model for harder tasks, and a real-time router that selects which one to use based on conversation type, complexity, tool needs, and explicit intent (such as “think hard”). The routing system is said to be continuously trained on user and preference signals, though details aren’t fully exposed.

Safety and reliability claims form the other pillar. The transcript highlights “safe completions” as an alternative to brittle refusal-only training, aiming to reduce failure modes when user intent is obscured. It also emphasizes reduced sycophancy: offline evaluations reportedly show GPT-5 main performing nearly three times better than GPT-4o (with GPT-5 thinking outperforming both), and the model is described as harder to jailbreak. Instruction hierarchy is presented as another guardrail—system instructions should override developer instructions, which should override user instructions—paired with tests for secret extraction attempts and “access granted” style prompts.

Finally, the transcript ties the technical changes to user experience: GPT-5 is portrayed as more likely to follow instructions and integrate with tooling in a controlled way, while long-context reasoning benchmarks and SVG generation tests are used as proof points. The overall message is that GPT-5 isn’t just “a bit better”—it’s engineered to be cheaper to run, easier to select correctly, and more dependable across safety, accuracy, and tool use, making it a default workhorse rather than a model you only switch to for special tasks.

Cornell Notes

GPT-5 is presented as a more usable default because it combines lower effective cost, large context/output limits, and an internal routing system that chooses between fast and deeper reasoning modes. Pricing is positioned as especially favorable for GPT-5 mini and GPT-5 nano, with token caching discounts that can sharply reduce costs on repeated inputs. The model is described as having a 400,000-token input context and up to 128,000 output tokens per request, with a September 30 knowledge cutoff. Safety work is framed around “safe completions,” reduced sycophancy, stronger instruction hierarchy, and lower hallucination/deception rates in reported evaluations. The practical implication: fewer manual model swaps and more consistent instruction-following and tool behavior.

How do GPT-5’s token prices compare to other mainstream models, and why might sticker price not equal real cost?

GPT-5 is listed at $1.25 per million input tokens and $10 per million output tokens. GPT-5 mini is $0.25 in / $2 out, and GPT-5 nano is $0.05 in / $0.40 out. The transcript argues that real cost depends on how many tokens a model generates for the same task—so two models with the same token rates can differ hugely in total spend. It cites an example where Grok 4 costs more to run than Claude 4 despite similar pricing because Grok 4 generates far more tokens.

What context and output limits are claimed for GPT-5, and what knowledge cutoff is given?

The context window is described as 400,000 tokens for input. Output is described as up to 128,000 tokens per request. The knowledge cutoff is given as September 30, and the transcript notes this matches an “infer cutoff by asking for the most recent League of Legends champion” method.

What does “unified system” routing mean in practice for GPT-5?

Instead of requiring users to manually pick a model, GPT-5 is described as combining: (1) a smart and fast general model, (2) a deeper reasoning model for harder problems, and (3) a real-time router that selects which model to use based on conversation type, complexity, tool needs, and explicit intent (e.g., “Think hard about this”). The router is said to be trained on real signals like user model switching and preference rates, though some routing details aren’t exposed externally.

Which safety training changes are highlighted, and what problem are they meant to address?

The transcript highlights “safe completions,” which centers safety on the assistance output rather than using a binary refusal boundary based only on whether the user request is allowed. The motivation is that refusal-only training can be brittle when user intent is obscured, and binary boundaries can be ill-suited for dual-use areas like biology and cybersecurity where partial, high-level completion might still enable misuse.

What evidence is offered for reduced sycophancy, and why does it matter?

Sycophancy is described as a major issue associated with GPT-40 behavior. For GPT-5, offline evaluations reportedly show GPT-5 main performing nearly three times better than the most recent GPT-40 on a sycophancy scoring metric (0.145 vs 0.052). GPT-5 thinking is said to outperform both, and the transcript also notes free users may show more “crazy” responses than paid users in these comparisons.

How does GPT-5’s instruction-following and hallucination profile get characterized?

The transcript emphasizes instruction hierarchy and instruction-following as a key practical improvement: GPT-5 is portrayed as doing what it’s told more reliably than prior models. For hallucinations, reported testing is summarized as GPT-5 thinking having under 5% incorrect claims, versus higher error rates for GPT-4o and o3/40 variants (with major incorrect-claim responses rising above 20% for those). Deception tests are also described as showing much lower deception rates for GPT-5.

Review Questions

What pricing and token-generation differences would make two models with similar per-token rates cost very differently in practice?
How does the described instruction hierarchy (system > developer > user) help prevent prompt-injection attempts to extract secrets or override safety constraints?
Why might “safe completions” reduce brittle refusal behavior compared with binary refusal training when user intent is ambiguous?

Key Points

1
GPT-5 standard pricing is $1.25 per million input tokens and $10 per million output tokens, with GPT-5 mini at $0.25/$2 and GPT-5 nano at $0.05/$0.40.
2
Effective cost can differ from token rates because models generate different token volumes for the same task; token caching can further reduce input costs by up to 90%.
3
GPT-5 is described as supporting a 400,000-token input context and up to 128,000 output tokens per request, with a September 30 knowledge cutoff.
4
A “unified system” routes requests in real time between fast general answering and deeper reasoning based on complexity, tool needs, and explicit intent.
5
Safety improvements highlighted include “safe completions” (output-focused safety training), reduced sycophancy, harder-to-jailbreak behavior, and stronger instruction hierarchy enforcement.
6
Reported evaluations claim lower hallucination and deception rates for GPT-5 thinking compared with prior model families.
7
The practical goal is fewer manual model swaps: GPT-5 is positioned as a more reliable default for instruction-following and tool use.

Highlights

GPT-5 mini and nano pricing is framed as low enough to replace common “cheap model” workflows, especially when token caching applies.

The transcript credits GPT-5’s internal router with selecting fast vs reasoning behavior automatically, based on request complexity and tool needs.

Safety work is centered on “safe completions” and reduced sycophancy, with offline metrics showing large improvements over GPT-40.

GPT-5 is described as cutting hallucinations and deception rates substantially in reported tests, with GPT-5 thinking under 5% incorrect claims.

Topics

GPT-5 Pricing
Context Window
Model Routing
Safety Training
Sycophancy

Mentioned

Theo - t3․gg