Everything you need to know about GPT-5 (+ mini and nano)
Based on Theo - t3․gg's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
GPT-5 standard pricing is $1.25 per million input tokens and $10 per million output tokens, with GPT-5 mini at $0.25/$2 and GPT-5 nano at $0.05/$0.40.
Briefing
GPT-5 arrives with a pricing structure and routing system that make it practical to use more often—especially the “mini” and “nano” tiers—while OpenAI’s safety and reliability work targets long-standing pain points like sycophancy, hallucinations, and instruction leakage. The headline takeaway is cost-efficiency at scale: GPT-5’s standard model is priced at $1.25 per million input tokens and $10 per million output tokens, and the smaller variants are dramatically cheaper, with GPT-5 mini at $0.25 per million input and $2 per million output, and GPT-5 nano at $0.05 per million input and $0.40 per million output. That combination, plus token caching discounts, is positioned as a shift from “pick the cheapest model” to “choose the right effort level” without blowing up budgets.
The transcript spends significant time putting those numbers in context. Compared with other popular models—citing examples like o3, o1, Claude 4 Opus, and Sonnet—GPT-5’s token rates are framed as meaningfully lower, and the speaker argues that real-world cost can diverge from sticker price because different models generate different token volumes. Benchmarks like SkateBench are used to illustrate that gap: GPT-5 is described as far cheaper per test run than Grok 4, with the speaker claiming roughly two orders of magnitude difference in average cost. The same logic is applied to why “mini” and “nano” matter: they’re portrayed as strong enough to replace cheaper-but-less-capable options, including common Gemini workflows.
Beyond price, GPT-5’s operational profile is defined by a large context window and high output capacity. The input context is said to land at 400,000 tokens, while output can reach up to 128,000 tokens per request. An official knowledge cutoff is given as September 30, aligning with a simple “recent League of Legends champion” test used to infer cutoff timing.
A major theme is model routing inside a “unified system.” Instead of forcing users to manually swap models, GPT-5 is described as combining a fast general model, a deeper reasoning model for harder tasks, and a real-time router that selects which one to use based on conversation type, complexity, tool needs, and explicit intent (such as “think hard”). The routing system is said to be continuously trained on user and preference signals, though details aren’t fully exposed.
Safety and reliability claims form the other pillar. The transcript highlights “safe completions” as an alternative to brittle refusal-only training, aiming to reduce failure modes when user intent is obscured. It also emphasizes reduced sycophancy: offline evaluations reportedly show GPT-5 main performing nearly three times better than GPT-4o (with GPT-5 thinking outperforming both), and the model is described as harder to jailbreak. Instruction hierarchy is presented as another guardrail—system instructions should override developer instructions, which should override user instructions—paired with tests for secret extraction attempts and “access granted” style prompts.
Finally, the transcript ties the technical changes to user experience: GPT-5 is portrayed as more likely to follow instructions and integrate with tooling in a controlled way, while long-context reasoning benchmarks and SVG generation tests are used as proof points. The overall message is that GPT-5 isn’t just “a bit better”—it’s engineered to be cheaper to run, easier to select correctly, and more dependable across safety, accuracy, and tool use, making it a default workhorse rather than a model you only switch to for special tasks.
Cornell Notes
GPT-5 is presented as a more usable default because it combines lower effective cost, large context/output limits, and an internal routing system that chooses between fast and deeper reasoning modes. Pricing is positioned as especially favorable for GPT-5 mini and GPT-5 nano, with token caching discounts that can sharply reduce costs on repeated inputs. The model is described as having a 400,000-token input context and up to 128,000 output tokens per request, with a September 30 knowledge cutoff. Safety work is framed around “safe completions,” reduced sycophancy, stronger instruction hierarchy, and lower hallucination/deception rates in reported evaluations. The practical implication: fewer manual model swaps and more consistent instruction-following and tool behavior.
How do GPT-5’s token prices compare to other mainstream models, and why might sticker price not equal real cost?
What context and output limits are claimed for GPT-5, and what knowledge cutoff is given?
What does “unified system” routing mean in practice for GPT-5?
Which safety training changes are highlighted, and what problem are they meant to address?
What evidence is offered for reduced sycophancy, and why does it matter?
How does GPT-5’s instruction-following and hallucination profile get characterized?
Review Questions
- What pricing and token-generation differences would make two models with similar per-token rates cost very differently in practice?
- How does the described instruction hierarchy (system > developer > user) help prevent prompt-injection attempts to extract secrets or override safety constraints?
- Why might “safe completions” reduce brittle refusal behavior compared with binary refusal training when user intent is ambiguous?
Key Points
- 1
GPT-5 standard pricing is $1.25 per million input tokens and $10 per million output tokens, with GPT-5 mini at $0.25/$2 and GPT-5 nano at $0.05/$0.40.
- 2
Effective cost can differ from token rates because models generate different token volumes for the same task; token caching can further reduce input costs by up to 90%.
- 3
GPT-5 is described as supporting a 400,000-token input context and up to 128,000 output tokens per request, with a September 30 knowledge cutoff.
- 4
A “unified system” routes requests in real time between fast general answering and deeper reasoning based on complexity, tool needs, and explicit intent.
- 5
Safety improvements highlighted include “safe completions” (output-focused safety training), reduced sycophancy, harder-to-jailbreak behavior, and stronger instruction hierarchy enforcement.
- 6
Reported evaluations claim lower hallucination and deception rates for GPT-5 thinking compared with prior model families.
- 7
The practical goal is fewer manual model swaps: GPT-5 is positioned as a more reliable default for instruction-following and tool use.