This Month is HUGE! o3 & o4 mini, Llama 4, VEO 2 in Gemini & Much More!
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
OpenAI is scheduling o3 and o4 mini for release in “a couple of weeks,” followed by GPT-5 in a few months, reversing an earlier plan to avoid standalone o3.
Briefing
OpenAI is reversing course on its near-term model rollout: o3 and o4 mini are back on the schedule for release in “a couple of weeks,” followed by GPT-5 “in a few months.” The shift matters because it changes how users will access OpenAI’s reasoning-focused models—moving from an earlier plan where o3 would not ship as a standalone product and would instead be folded into a bundled GPT-5 system that selects the best model at the right time.
The earlier February plan, tied to CEO Sam Altman’s tweet, suggested OpenAI would stop shipping o3 as a standalone model and instead position GPT-5 as an integrated system that bundles multiple technologies and dynamically chooses among them. Now Altman says the plan has changed: o3 standalone and o4 mini are coming first, with GPT-5 later. The stated reasons include the difficulty of integrating everything smoothly into one GPT-5 package and the need to ensure enough compute capacity for what Altman calls “unprecedented demand.” That demand concern is underscored by a recent example where OpenAI’s GPT-4o native image generation release reportedly overwhelmed servers.
Altman also frames the change as enabling a better GPT-5 than previously expected, though without technical specifics. The practical implication is that OpenAI is buying time—shipping smaller reasoning models now while working through integration and scaling challenges for the larger GPT-5 vision. There’s also a competitive angle: the rollout timing comes as Google pushes Gemini 2.5 Pro into public preview, with claims of usage records and strong performance, especially for coding and long-context tasks.
Google’s Gemini 2.5 Pro public preview includes pricing details that emphasize aggressive cost. For inputs above 200,000 tokens, the rate is $1.25 per million input tokens and $10 per million output tokens; below that threshold, pricing rises to $2.50 per million input and $15 per million output. The transcript contrasts this with OpenAI’s commonly used tiers—GPT-4o is positioned as more expensive than Gemini 2.5 Pro, Sonnet 3.7 as even pricier, and an OpenAI model labeled “GPT01” as far more expensive while benchmarking worse than Gemini 2.5 Pro.
Beyond pricing, the month’s AI calendar stretches across modalities and vendors. Google is also rolling out V2 to Gemini, with the transcript noting that image inputs aren’t available yet—an omission that could limit use cases. Early tests of V2 emphasize fast generation and coherent prompt following in several video prompts, though one example (a robot running on “nothing” to land on) shows the model’s occasional physics/scene consistency gaps.
On the open-source side, Llama 4 is expected “this month,” with reports of Meta struggling to get it out due to performance issues in reasoning, math, and humanlike conversation benchmarks. Llama 4 is also said to be switching to Mixture of Experts, a move linked to the competitive pressure created by DeepSeek’s strong open-source results.
Image generation remains a battleground. Idog V3 is placed on an “artificial analysis image arena,” landing around fourth place with an ELO score near 1,095, while GPT-4 is described as leading due to its autoregressive native image generation approach. Midjourney V7 is also released in alpha testing, with comparisons suggesting improved coherence and prompt adherence over V6, but weaker text performance—keeping Midjourney’s reputation largely centered on aesthetics rather than instruction-following or typography.
Taken together, the month’s biggest story is the tug-of-war between integrated “one system” model strategies and specialized, separately shipped models—while pricing and scaling constraints determine which platforms can win both developers and everyday users quickly.
Cornell Notes
OpenAI is scheduling o3 and o4 mini for release in “a couple of weeks,” then targeting GPT-5 for a few months later. That reverses an earlier plan where o3 would not ship standalone and would instead be folded into a GPT-5 system that dynamically selects the best internal model. The new rationale centers on integration difficulty and the need for enough compute capacity for expected demand, especially after prior server strain from GPT-4o native image generation. Meanwhile, Google is pushing Gemini 2.5 Pro into public preview with long-context strengths and pricing that undercuts several competing APIs. Across the ecosystem, Llama 4, Gemini V2 video, and Midjourney V7 are also in motion, keeping competition focused on reasoning quality, multimodal performance, and cost.
What changed in OpenAI’s rollout plan for o3/o4 mini and GPT-5?
Why does Altman’s “capacity” argument matter for users and developers?
How does Gemini 2.5 Pro’s pricing compare, and what does that imply for adoption?
What’s the practical limitation mentioned for Google’s V2 video in Gemini?
What does the transcript suggest about Llama 4’s development and technical direction?
How do the image-generation standings and Midjourney V7 comparisons differ by capability?
Review Questions
- How does the updated OpenAI plan change the user experience compared with the earlier “o3 not standalone” approach?
- Which pricing threshold changes Gemini 2.5 Pro’s input/output rates, and why is that relevant for long-context tasks?
- What evidence in the transcript suggests that Midjourney V7’s main strength is aesthetics rather than instruction-following or text rendering?
Key Points
- 1
OpenAI is scheduling o3 and o4 mini for release in “a couple of weeks,” followed by GPT-5 in a few months, reversing an earlier plan to avoid standalone o3.
- 2
Altman cites integration difficulty and compute capacity planning as reasons for the rollout change, pointing to prior server strain from GPT-4o native image generation.
- 3
Google’s Gemini 2.5 Pro is in public preview with long-context strengths (up to a million tokens) and token pricing that is framed as cheaper than several competing APIs.
- 4
Gemini V2 video generation is rolling out, but image inputs are not yet available, limiting certain conditioning workflows.
- 5
Llama 4 is expected this month but is described as delayed by performance issues; it’s also reported to switch to Mixture of Experts.
- 6
In image generation, GPT-4 is described as leading an arena due to autoregressive native image generation, while Midjourney V7 is improving coherence but remains weak at text.