AI RECAP: Rumored GPT-4o Large Model & Gemini Live vs GPT-4o Advanced Voice

TL;DR

The “Q* / strawberry” rumor claims a new reasoning architecture could improve first-pass problem solving, illustrated by a letter-count prompt that often fails today.

Briefing Cornell Notes

Briefing

A swirl of “strawberry/Q*” rumors about OpenAI’s next reasoning model is colliding with concrete updates—yet the most important question remains unanswered: whether any new model actually changes what users can do inside ChatGPT. The speculation centers on a rumored “Q* / strawberry” architecture described as a new reasoning engine meant to give large language models more human-like problem solving. A simple test is used to illustrate the gap today: asking a model how many “R” letters are in “strawberry” often produces an incorrect count, even though it can sometimes self-correct after being prompted.

The hype takes on a more concrete feel because high-profile OpenAI figures appear to engage with strawberry-themed posts on X. Sam Altman replies “amazing” to a tweet from an account using strawberries as its branding, and other AI insiders amplify the attention. At the same time, the official ChatGPT account posts about a “new GPT-4o Omni model” being available in ChatGPT, but the transcript’s author reports not noticing any obvious differences in the app—no clear model switch, no access to new “Advanced voice,” and no change in image generation behavior (still tied to the Dolly3 API and existing image tools).

Still, one prediction seems to land on timing. The strawberry-themed account had claimed something would arrive from OpenAI at 10:00 a.m., and OpenAI did release an updated SWE-bench iteration—a benchmark designed to evaluate how well AI systems solve real-world software tasks. That matters because it’s a more developer-relevant yardstick than generic chat benchmarks. The same account also points to Thursday as a likely day for a “GPT-4o Omni large” release, with the idea that OpenAI might pair a new model with a benchmark drop to justify performance gains.

Even if a Thursday release arrives, the transcript argues that “better reasoning” won’t be convincing without a clear explanation of how it works and when it beats alternatives like prompting tricks or retrieval/search-based systems such as Perplexity. The core frustration is transparency: users want to know what’s actually changing under the hood, not just what’s being teased.

On the Google side, the day’s major announcement is “Gemini Live,” positioned as a voice-first experience on Android. The transcript’s author demonstrates a Gemini Live flow with multiple selectable voices and a conversation that brainstorms science activities, but then draws a key distinction: Gemini Live appears to convert speech to text, generate a response in text, and then convert that text back to audio—meaning it’s not “native” multimodal voice in the same way as OpenAI’s rumored/advanced voice approach. The Gemini Live feature set also includes Android integration and small utilities (like a calendar extension that reads concert flyers), but nothing presented as a leap that would force a platform switch. Overall, the transcript frames OpenAI’s strawberry/Q* mystery as the more consequential story, while Google’s Gemini Live looks incremental compared with what’s already available elsewhere.

Cornell Notes

Rumors about OpenAI’s “Q* / strawberry” architecture claim a new reasoning engine could make large language models solve problems more like humans. A common illustration is the “strawberry” letter-count prompt, where models often give wrong answers unless users prompt for self-correction. Despite strawberry-themed posts from prominent OpenAI figures and a claim of an upcoming “GPT-4o Omni large” on Thursday, reported testing finds no obvious model changes in ChatGPT, and image generation still appears tied to existing Dolly3-based tools. OpenAI did release an updated SWE-bench benchmark at the predicted time, a meaningful signal for developers because it targets real software task performance. Google’s Gemini Live launches on Android with voice and multiple voices, but it appears to rely on speech-to-text and text-to-speech rather than native multimodal voice, making it less of a direct competitor to advanced voice systems.

What is the “strawberry/Q*” reasoning claim, and how is it tested in the transcript?

The rumor describes a new “Q* / strawberry” architecture as a reasoning engine meant to deliver more human-like problem solving. The transcript uses a simple prompt test: asking a model how many “R” letters are in the word “strawberry.” Many large language models give an incorrect count at first, though they can sometimes correct themselves after being told the answer is wrong and asked to spell it out. The implication is that current LLM behavior often lacks reliable first-pass reasoning.

Why do strawberry-themed posts from prominent accounts matter to the rumor’s credibility?

The transcript highlights engagement on X: Sam Altman replies “amazing” to a strawberry-branded account’s tweet, and other AI insiders react skeptically or amplify the attention. Because these are high-visibility figures, the author treats the interaction as a potential hint that something is in motion—while still acknowledging it could be unrelated hype.

What concrete OpenAI update lands at the predicted time, and why is it significant?

At 10:00 a.m., OpenAI releases a new iteration of SWE-bench. SWE-bench is described as a benchmark that more reliably evaluates AI models’ ability to solve real-world software issues. That makes it relevant to developers because it measures task performance on software problems rather than only conversational quality.

What does the transcript claim about “GPT-4o Omni” and user-visible changes in ChatGPT?

After the official ChatGPT account announces a “new GPT-4o Omni model” in ChatGPT, the author reports running tests and noticing no obvious differences across desktop and phone. They also report lacking access to “Advanced voice” and seeing no change in image generation behavior, which still appears to use the Dolly3 API via existing image tools.

How does Gemini Live differ from OpenAI’s advanced voice approach, according to the transcript?

Gemini Live is demonstrated as a voice experience with multiple selectable voices. But the transcript argues it is not native multimodal voice: it appears to convert the user’s speech to text, generate a response as text, then convert that text back to audio using text-to-speech. By contrast, OpenAI’s advanced voice mode is described as generating voice natively from the model and capturing emotional nuance and expressive styles.

Review Questions

What specific prompt example is used to illustrate the gap between current LLM behavior and human-like reasoning, and why does it matter?
Which OpenAI release mentioned in the transcript is tied to developer evaluation (benchmarks), and what does SWE-bench measure?
According to the transcript, what technical pipeline does Gemini Live appear to use, and how does that affect its claim to “advanced voice” capability?

Key Points

1
The “Q* / strawberry” rumor claims a new reasoning architecture could improve first-pass problem solving, illustrated by a letter-count prompt that often fails today.
2
Prominent OpenAI engagement with strawberry-themed posts increases attention, but reported user testing finds no obvious ChatGPT model changes yet.
3
OpenAI’s release of an updated SWE-bench at the predicted time is a concrete, developer-relevant signal because it targets real software task performance.
4
The transcript’s author reports no visible differences in GPT-4o Omni behavior, no access to Advanced voice, and no clear change in image generation pathways tied to Dolly3.
5
The Thursday “GPT-4o Omni large” claim remains unverified in the transcript, with skepticism focused on the need for transparent explanations and measurable advantages.
6
Google’s Gemini Live launches on Android with multiple voices and useful utilities, but it appears to rely on speech-to-text and text-to-speech rather than native multimodal voice.
7
Gemini Live is framed as incremental compared with advanced voice systems, while the OpenAI strawberry mystery is treated as the more consequential storyline.

Highlights

SWE-bench is positioned as the most meaningful “proof point” so far because it evaluates real software problem solving rather than generic chat quality.

Despite official talk of a “GPT-4o Omni” update, the transcript’s author reports no noticeable model differences in ChatGPT during testing.

Gemini Live’s voice experience appears to follow a speech-to-text → text response → text-to-speech pipeline, which the transcript contrasts with “native” multimodal voice behavior.

Topics

OpenAI Q* Rumors
GPT-4o Omni
SWE-bench
Gemini Live
Advanced Voice

Mentioned

Sam Altman
GPT-4o
AI
X
SWE-bench
LLMs
API
TTS
Dolly3