GPT 5.2 and Image-gen-2 from Open AI - A final swing at Google?
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
GPT Image 2 (assumed “Hazel Gen 2”) can label many human-cell components correctly, but specific organelles (notably Golgi and lysosome placement) still show scientific inaccuracies.
Briefing
OpenAI’s latest push—GPT 5.2 plus an image model billed as “Image-gen-2”—is landing as a serious, if uneven, challenge to Google’s top generators. Early community tests suggest the image system can produce more scientifically accurate labeled diagrams than Google’s “Nano Banana Pro,” while GPT 5.2’s coding and 3D demos look competitive with Google’s best models—especially in how much complete functionality can be generated from code-only prompts.
On the image side, a human-cell diagram prompt (“Create a fully labeled diagram of a human cell with at least 10 elements”) becomes the main battleground. The diagram produced by “Hazel Gen 2” (widely assumed to be OpenAI’s GPT Image 2) gets several core labels right—human cell spelling, the plasma membrane placement and detail, and multiple organelles such as the cytoskeleton, centrosome, lysosome, mitochondria, Golgi apparatus, vesicles, ribosomes, and smooth/rough endoplasmic reticulum. But accuracy breaks down in specific scientific spots: the Golgi apparatus appears scrambled or mislocated, ribosomes are rendered as tiny dots, and some organelle associations look swapped or misplaced (including confusion between vesicles and mitochondria, and a lysosome that points toward the nucleus rather than its proper location). Even so, the tester’s bottom line is that Nano Banana Pro still wins in “scientific accuracy,” but by a smaller margin than expected.
A second prompt—a chai recipe with step-by-step instructions—shows fewer obvious errors and highlights a different kind of strength: GPT Image 2’s output includes more structured, step-aligned visuals and a more “textbook” feel, while Nano Banana Pro’s visuals lean more artistic and ingredient-focused. The comparison ultimately comes down to preference: one system is more semantically tidy and step-by-step, the other more visually stylized.
Then the focus shifts to GPT 5.2, released as the model’s rollout lands. In Element Arena demos, GPT 5.2 generates substantial 3D scenes and interactive projects from code prompts alone—no image inputs—ranging from a Golden Gate Bridge with adjustable weather, time of day, and traffic density to voxel foliage, animated fish, and even a rocket launch with particle effects. Benchmarks cited from LM Arena and AIM-style leaderboards place GPT 5.2 near the top: it’s described as improving over GPT 5.1, with “high thinking” variants trading off cost and accuracy against Claude 4.5 Opus and Gemini 3 Pro.
But practical tests also reveal rough edges. In a water-physics HTML challenge (interactive 3D with reflections and wave simulation), GPT 5.2 initially produces broken or incomplete code, then iterates toward a working version—though it introduces issues like “ghost lemons” and glitchy physics. In a physics-based jelly platformer, GPT 5.2 generates very large codebases (over a thousand lines) and playable-looking graphics, yet physics can become unbalanced or unplayable due to assumptions about frame rate and tuning. The overall verdict: GPT 5.2 is surprisingly competitive and strong as a starting point for real projects, while Google’s Gemini 3 Pro remains formidable—sometimes even better at producing tighter, more immediately functional results in specific physics tasks.
Cornell Notes
OpenAI’s GPT 5.2 and “Image-gen-2” are being tested against Google’s Nano Banana Pro and Gemini 3 Pro across two fronts: labeled scientific diagrams and code-generated interactive projects. In a human-cell diagram task, GPT Image 2 (assumed to be “Hazel Gen 2”) gets many major labels right but still misplaces or scrambles some organelles, making Nano Banana Pro slightly ahead on strict scientific accuracy. For chai recipe generation, both systems perform well, with differences mainly in how visuals are organized and how “textbook” versus “artistic” the presentation feels. GPT 5.2’s code-only 3D demos look highly competitive, but hands-on physics tests show it can produce broken code or “ghost” glitches and physics tuning issues that require iteration.
How did GPT Image 2 perform on the human-cell diagram compared with Nano Banana Pro?
What did the chai recipe comparison reveal about each image model’s strengths?
What makes GPT 5.2 stand out in the code-and-3D demos?
Why did GPT 5.2 struggle in the water-physics HTML test?
How did GPT 5.2 perform on the physics jelly platformer, and what went wrong?
Review Questions
- In the human-cell diagram test, which organelles were most likely to be mislocated or misidentified, and why does that matter for “scientific accuracy”?
- What differences in output format (step-aligned visuals vs more artistic ingredient/process visuals) influenced the chai recipe comparison?
- In the water-physics and jelly-game tests, what kinds of failures appeared (broken code, physics instability, frame-rate assumptions), and how would you design prompts or validation steps to reduce them?
Key Points
- 1
GPT Image 2 (assumed “Hazel Gen 2”) can label many human-cell components correctly, but specific organelles (notably Golgi and lysosome placement) still show scientific inaccuracies.
- 2
Nano Banana Pro edges GPT Image 2 on strict scientific correctness in the human-cell diagram, though the gap appears smaller than expected.
- 3
Chai recipe generation works well for both systems; the main differences are how visuals map to steps and ingredients and how “textbook” versus “artistic” the presentation feels.
- 4
GPT 5.2’s strongest early signal is code-only generation of substantial 3D projects with adjustable parameters (weather, time of day, traffic) and effects like reflections and particles.
- 5
Hands-on physics coding reveals reliability gaps: GPT 5.2 can output broken or incomplete CodePen code and may introduce glitches such as “ghost” objects.
- 6
Physics games can fail due to tuning assumptions (like frame rate) and overly strong mechanics, even when graphics and overall structure look impressive.
- 7
Benchmark chatter places GPT 5.2 near the top against Gemini 3 Pro and Claude 4.5 Opus, with tradeoffs between accuracy (“high thinking”) and cost per task.