Get AI summaries of any video or article — Sign up free
This Month is HUGE! o3 & o4 mini, Llama 4, VEO 2 in Gemini & Much More! thumbnail

This Month is HUGE! o3 & o4 mini, Llama 4, VEO 2 in Gemini & Much More!

MattVidPro·
6 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

OpenAI is scheduling o3 and o4 mini for release in “a couple of weeks,” followed by GPT-5 in a few months, reversing an earlier plan to avoid standalone o3.

Briefing

OpenAI is reversing course on its near-term model rollout: o3 and o4 mini are back on the schedule for release in “a couple of weeks,” followed by GPT-5 “in a few months.” The shift matters because it changes how users will access OpenAI’s reasoning-focused models—moving from an earlier plan where o3 would not ship as a standalone product and would instead be folded into a bundled GPT-5 system that selects the best model at the right time.

The earlier February plan, tied to CEO Sam Altman’s tweet, suggested OpenAI would stop shipping o3 as a standalone model and instead position GPT-5 as an integrated system that bundles multiple technologies and dynamically chooses among them. Now Altman says the plan has changed: o3 standalone and o4 mini are coming first, with GPT-5 later. The stated reasons include the difficulty of integrating everything smoothly into one GPT-5 package and the need to ensure enough compute capacity for what Altman calls “unprecedented demand.” That demand concern is underscored by a recent example where OpenAI’s GPT-4o native image generation release reportedly overwhelmed servers.

Altman also frames the change as enabling a better GPT-5 than previously expected, though without technical specifics. The practical implication is that OpenAI is buying time—shipping smaller reasoning models now while working through integration and scaling challenges for the larger GPT-5 vision. There’s also a competitive angle: the rollout timing comes as Google pushes Gemini 2.5 Pro into public preview, with claims of usage records and strong performance, especially for coding and long-context tasks.

Google’s Gemini 2.5 Pro public preview includes pricing details that emphasize aggressive cost. For inputs above 200,000 tokens, the rate is $1.25 per million input tokens and $10 per million output tokens; below that threshold, pricing rises to $2.50 per million input and $15 per million output. The transcript contrasts this with OpenAI’s commonly used tiers—GPT-4o is positioned as more expensive than Gemini 2.5 Pro, Sonnet 3.7 as even pricier, and an OpenAI model labeled “GPT01” as far more expensive while benchmarking worse than Gemini 2.5 Pro.

Beyond pricing, the month’s AI calendar stretches across modalities and vendors. Google is also rolling out V2 to Gemini, with the transcript noting that image inputs aren’t available yet—an omission that could limit use cases. Early tests of V2 emphasize fast generation and coherent prompt following in several video prompts, though one example (a robot running on “nothing” to land on) shows the model’s occasional physics/scene consistency gaps.

On the open-source side, Llama 4 is expected “this month,” with reports of Meta struggling to get it out due to performance issues in reasoning, math, and humanlike conversation benchmarks. Llama 4 is also said to be switching to Mixture of Experts, a move linked to the competitive pressure created by DeepSeek’s strong open-source results.

Image generation remains a battleground. Idog V3 is placed on an “artificial analysis image arena,” landing around fourth place with an ELO score near 1,095, while GPT-4 is described as leading due to its autoregressive native image generation approach. Midjourney V7 is also released in alpha testing, with comparisons suggesting improved coherence and prompt adherence over V6, but weaker text performance—keeping Midjourney’s reputation largely centered on aesthetics rather than instruction-following or typography.

Taken together, the month’s biggest story is the tug-of-war between integrated “one system” model strategies and specialized, separately shipped models—while pricing and scaling constraints determine which platforms can win both developers and everyday users quickly.

Cornell Notes

OpenAI is scheduling o3 and o4 mini for release in “a couple of weeks,” then targeting GPT-5 for a few months later. That reverses an earlier plan where o3 would not ship standalone and would instead be folded into a GPT-5 system that dynamically selects the best internal model. The new rationale centers on integration difficulty and the need for enough compute capacity for expected demand, especially after prior server strain from GPT-4o native image generation. Meanwhile, Google is pushing Gemini 2.5 Pro into public preview with long-context strengths and pricing that undercuts several competing APIs. Across the ecosystem, Llama 4, Gemini V2 video, and Midjourney V7 are also in motion, keeping competition focused on reasoning quality, multimodal performance, and cost.

What changed in OpenAI’s rollout plan for o3/o4 mini and GPT-5?

OpenAI previously signaled that o3 would not ship as a standalone model, with GPT-5 envisioned as an integrated system that bundles multiple technologies and chooses the best model at the right time. The updated plan brings o3 standalone and o4 mini back “in a couple of weeks,” with GPT-5 arriving “in a few months.” The shift is attributed to integration challenges and capacity planning for demand, plus the claim that GPT-5 can be made better than originally expected.

Why does Altman’s “capacity” argument matter for users and developers?

The transcript links the capacity concern to real operational strain: GPT-4o native image generation reportedly “swamped up their servers.” If OpenAI expects unprecedented demand for GPT-5 (a name that implies broad interest), then delaying full integration into GPT-5 while shipping smaller minis first can reduce risk—ensuring the platform can handle traffic without degrading performance.

How does Gemini 2.5 Pro’s pricing compare, and what does that imply for adoption?

Gemini 2.5 Pro is priced at $1.25 per million input tokens and $10 per million output tokens for inputs above 200,000 tokens, and $2.50/$15 per million for smaller inputs. The transcript frames this as cheaper than OpenAI’s commonly referenced tiers (GPT-4o, Sonnet 3.7) and far cheaper than an OpenAI model labeled “GPT01,” which is described as orders of magnitude more expensive. Lower cost per token—especially for long-context workloads—can drive adoption even if users value reasoning quality differently.

What’s the practical limitation mentioned for Google’s V2 video in Gemini?

A key constraint is that “input images” aren’t available with V2 and Gemini yet. That can block workflows that rely on image-conditioned video generation. Still, the transcript’s tests emphasize fast generation and coherent outputs for several prompts (e.g., slow-motion fox footage and cinematic car fire/smoke), while one robot-on-moon example shows scene/physics inconsistencies.

What does the transcript suggest about Llama 4’s development and technical direction?

Llama 4 is expected “this month,” but Meta is described as having faced performance issues—benchmarks falling short in reasoning, math, and humanlike conversation. The model is also said to switch to Mixture of Experts, a technique associated with competitive pressure after DeepSeek’s strong open-source performance. The expectation is that Llama 4 will be multimodal and include reasoning, with variants potentially aimed at local deployment.

How do the image-generation standings and Midjourney V7 comparisons differ by capability?

Idog V3 is placed around fourth on an image arena (ELO ~1,095), while GPT-4 is described as #1 due to autoregressive native image generation. Midjourney V7 is described as improving coherence and prompt adherence versus V6, but text remains weak: Midjourney is “not optimized for text,” while other models (Recraft, idog 3.0, Native GPT4 image generation) are portrayed as stronger at typography and longer text.

Review Questions

  1. How does the updated OpenAI plan change the user experience compared with the earlier “o3 not standalone” approach?
  2. Which pricing threshold changes Gemini 2.5 Pro’s input/output rates, and why is that relevant for long-context tasks?
  3. What evidence in the transcript suggests that Midjourney V7’s main strength is aesthetics rather than instruction-following or text rendering?

Key Points

  1. 1

    OpenAI is scheduling o3 and o4 mini for release in “a couple of weeks,” followed by GPT-5 in a few months, reversing an earlier plan to avoid standalone o3.

  2. 2

    Altman cites integration difficulty and compute capacity planning as reasons for the rollout change, pointing to prior server strain from GPT-4o native image generation.

  3. 3

    Google’s Gemini 2.5 Pro is in public preview with long-context strengths (up to a million tokens) and token pricing that is framed as cheaper than several competing APIs.

  4. 4

    Gemini V2 video generation is rolling out, but image inputs are not yet available, limiting certain conditioning workflows.

  5. 5

    Llama 4 is expected this month but is described as delayed by performance issues; it’s also reported to switch to Mixture of Experts.

  6. 6

    In image generation, GPT-4 is described as leading an arena due to autoregressive native image generation, while Midjourney V7 is improving coherence but remains weak at text.

Highlights

OpenAI’s o3 and o4 mini are back on the near-term schedule, with GPT-5 pushed to “a few months,” after earlier messaging suggested o3 would not ship standalone.
Gemini 2.5 Pro’s pricing emphasizes long-context economics: $1.25 per million input tokens (above 200,000) and $10 per million output tokens.
Gemini V2 video tests show fast, coherent generations, but the lack of image inputs is flagged as a major limitation.
Llama 4 is expected this month, with reports of Meta performance struggles and a switch to Mixture of Experts.
Midjourney V7 shows coherence gains over V6 in comparisons, but text performance remains a weak spot versus other top models.

Topics

  • OpenAI Model Roadmap
  • Gemini 2.5 Pro Pricing
  • Gemini V2 Video
  • Llama 4 Mixture of Experts
  • Midjourney V7 Image Generation

Mentioned