GPT-5: Everything You Need to Know So Far
Based on AI Explained's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Greg Brockman’s remarks about scaling compute and Jason Wei’s “massive GPU training” reaction point to GPT-5’s full-scale training run being underway.
Briefing
OpenAI’s full-scale GPT-5 training run appears to be underway, with safety red-teaming already positioned for the next phase of testing. The strongest signals come from Greg Brockman’s remarks about scaling up to “maximally harness” computing resources for the biggest model yet, alongside Jason Wei’s reaction to the “massive GPU training” milestone. The timing matters because it suggests GPT-5 is moving from smaller, earlier training checkpoints into the longer, higher-stakes work of safety evaluation and capability validation—an arc that typically takes months rather than days.
Additional evidence points to safety readiness rather than a vague “soon” announcement. OpenAI closed applications for its red teaming network, and those red teamers were told they’d learn their application status by the end of last year. In practice, that implies the red team workforce is already in place to begin testing as GPT-5 progresses through checkpoints. Those checkpoints are important: even before a final model is fully trained, teams can evaluate intermediate versions, meaning OpenAI could effectively have “GPT-4.2”-style capability snapshots before the complete GPT-5 release.
The capability direction described by OpenAI insiders and aligned research is clear: GPT-5 is expected to “think for longer” by laying out reasoning steps and then verifying them. Sam Altman and other executives frame this as a shift toward more interactive, stepwise explanations that users can judge for reasonableness. The transcript also ties this to OpenAI’s “let’s verify step-by-step” work, where sampling a base model thousands of times and selecting outputs with higher-rated reasoning steps produced large gains in math and strong results across STEM. The key mechanism is parallelization: generate many candidate reasoning traces, then use a verifier-like process to pick the best.
On reliability, the same logic scales. If GPT-4 can be improved by repeatedly sampling and selecting stronger answers, GPT-5’s larger training and improved reasoning verification could make it far more dependable—especially for tasks where a single response can be “almost right” but not consistently correct. The transcript further connects this to prior approaches in coding and math, including DeepMind’s AlphaCode 2, which used massive sampling to reach high contest performance.
Hardware and model size expectations also enter the picture. In an interview, Etched AI CEO Gavin Uberti suggested GPT-5 could have roughly 10× the parameter count of GPT-4, potentially driven by larger embedding dimensions, more layers, and more experts in a mixture-of-experts style design. While exact numbers remain speculative, the underlying theme is that GPT-5’s performance gains likely come from both scale and better internal checking.
Finally, the transcript argues for a late-2024 release path rather than an immediate launch. The prediction lands toward the end of November 2024, based on multi-month training time plus extended safety testing, and also on avoiding the most contentious political period in the U.S. The release, it suggests, may arrive in staged checkpoints—capabilities rolling out over time rather than all at once—while OpenAI continues to push multimodality (speech, images, video) and, most importantly, reasoning reliability.
Cornell Notes
Signals from OpenAI leadership and researchers indicate GPT-5’s full-scale training run is underway, with red-teaming already positioned for safety testing as checkpoints emerge. The expected leap centers on longer, stepwise reasoning that can be verified—supported by OpenAI’s “let’s verify step-by-step” results, where sampling thousands of reasoning traces and selecting the best boosted math and STEM performance. Reliability improvements are framed as a scaling of the same idea: generate many attempts, then choose outputs with stronger reasoning. Hardware and architecture speculation points to much larger scale (possibly ~10× GPT-4 parameters) via embedding dimension, layers, and expert count. A late-2024, staged rollout is predicted, factoring in training duration, safety cycles, and political timing risks.
What evidence suggests GPT-5 training has moved beyond early experiments?
Why do checkpoints matter for when GPT-5 capabilities appear?
How does “thinking for longer” connect to verifier-based gains?
What reliability improvement is implied by sampling and selection?
What architectural scale changes are suggested for GPT-5?
Why predict a late-2024 rollout instead of an immediate release?
Review Questions
- What specific mechanism in “let’s verify step-by-step” produces large gains, and why does sampling help?
- How do checkpoints change the practical timeline for when users might see GPT-5-like capabilities?
- Which factors (training duration, safety testing, and political timing) drive the late-2024 release prediction?
Key Points
- 1
Greg Brockman’s remarks about scaling compute and Jason Wei’s “massive GPU training” reaction point to GPT-5’s full-scale training run being underway.
- 2
OpenAI’s red-teaming network closure and prior status timeline suggest safety testing is ready to begin as GPT-5 moves through checkpoints.
- 3
Checkpoint-based evaluation means intermediate GPT-5 capability snapshots could appear before the final model is fully trained.
- 4
GPT-5’s expected performance jump centers on longer, stepwise reasoning paired with verification and selection of stronger reasoning traces.
- 5
OpenAI’s “let’s verify step-by-step” results highlight how thousands of samples plus a verifier-like selection process can substantially improve math and STEM outcomes.
- 6
Reliability gains are framed as a scaling of sampling-and-selection: more attempts plus better selection yields more consistently strong answers.
- 7
A late-2024, staged rollout is predicted based on training time, safety testing length, and the desire to avoid election-related controversy.