Google just destroyed ChatGPT forever… (Gemini 2.5 Pro)

TL;DR

Gemini 2.5 Pro early access emphasizes stronger coding performance for interactive, aesthetically pleasing web apps, not just text output.

Briefing Cornell Notes

Briefing

Gemini 2.5 Pro is getting an early-access upgrade that pushes hardest on coding—especially building interactive, good-looking web apps—while also improving multimodal performance and long-context handling. Google positions the change as a meaningful step beyond the prior Gemini 2.5 Pro, highlighting new capabilities that can turn an input image into a code-based representation of its natural behavior, including subtle animation (like a leaf growing or moving in the wind). The practical takeaway: developers can ask for full working front-end code and get something that’s not just functional, but visually and behaviorally closer to what a user expects.

That coding jump is reflected in a webdev leaderboard where Gemini 2.5 Pro sits at the top with a score of 1419, ahead of Claude 3.7 Sonnet (second) and the previous Gemini 2.5 Pro (third). The scoring method emphasizes human preference for apps that look aesthetically pleasing and work in practice—not “minimum viable” demos that fail to hold up. The same leaderboard context credits Gemini’s strengths in multimodal tasks, long context, and video understanding, suggesting the upgrade isn’t limited to text-to-code.

A key operational improvement is also getting attention: Cursor CEO reports a significant reduction in “failure to call tools,” a problem that users reportedly hit with the earlier Gemini 2.5 Pro when working inside Cursor. That matters because tool-calling failures can break agent workflows—especially when an AI needs to run code, access external capabilities, or follow structured steps.

The transcript then walks through hands-on usage. In the Gemini app, users select “2.5 Pro” and can generate large amounts of code quickly. A demo prompts the model to one-shot a playable, WASD-controllable 2D Minecraft-like clone. The model responds with hundreds of lines of HTML and JavaScript, producing a working in-browser game where users can select blocks, place them with right-click, move around, break blocks, and dig—though jumping isn’t implemented. The point isn’t perfection from a single prompt; it’s that the model can reliably produce a coherent interactive app with minimal iteration.

For deployment and agent-building, the transcript highlights multiple integration paths: using Google AI Studio or the Gemini app UI (with direct code execution), and also selecting Gemini 2.5 Pro inside Vectal for task-driven AI agents. It then shifts to Cursor for Python-based agents, showing how to configure “Gemini 2.5 Pro Max” in Cursor settings and how to wire Gemini 2.5 Pro through OpenRouter using an OpenAI-compatible endpoint. The setup includes installing the OpenAI Python package, activating a Conda environment, creating an OpenRouter API key, and running a multimodal test where the model describes a Wikipedia image of Wisconsin–Madison.

Cost and speed are treated as practical advantages: the OpenRouter activity details show fast token throughput and extremely low per-request cost (fractions of a cent), while the model returns an accurate description of the image. The overall message is that Gemini 2.5 Pro’s upgrade isn’t just better answers—it’s better at producing working code, integrating into agent systems, and doing so efficiently enough to support real development workflows.

Cornell Notes

Gemini 2.5 Pro’s early-access upgrade targets coding quality, especially interactive web apps, while also improving multimodal and long-context performance. Google’s examples and a webdev leaderboard place Gemini 2.5 Pro at the top (1419), emphasizing human preference for apps that are both attractive and functional. A reported reduction in tool-calling failures inside Cursor matters for agent reliability, since tool calls often power multi-step workflows. The transcript demonstrates one-shot generation of a playable 2D Minecraft-like clone using HTML/JavaScript and shows how to build Python agents by routing Gemini 2.5 Pro through OpenRouter with an OpenAI-compatible endpoint. Cost efficiency is highlighted via low per-request pricing and fast token throughput.

What specific improvements are claimed for Gemini 2.5 Pro, and how are they measured?

The upgrade is framed as a significant jump in coding ability, particularly for building compelling interactive web apps. Measurement is tied to a webdev arena leaderboard that scores human preference for models’ ability to produce aesthetically pleasing and functional apps (not bare-bones demos). In that leaderboard, Gemini 2.5 Pro is listed at 1419, ahead of Claude 3.7 Sonnet and the previous Gemini 2.5 Pro. The transcript also links the leaderboard’s criteria to multimodal strength, long context, and video understanding.

Why does “failure to call tools” matter for developers using Gemini inside Cursor?

Tool-calling failures can derail agent workflows that depend on structured actions—like running code, invoking external functions, or following multi-step plans. Cursor’s CEO (Curser) reports a significant reduction in this failure mode with the new Gemini 2.5 Pro, which aligns with the transcript’s emphasis on building agents that can reliably complete tasks rather than stall when tool calls are needed.

How does the transcript demonstrate Gemini 2.5 Pro’s coding capability in practice?

A one-shot prompt asks for a working 2D clone of Minecraft that’s playable and controllable via WASD. Gemini generates roughly 700 lines of HTML and JavaScript, and the resulting app runs in the Gemini preview. The demo includes block selection via inventory, scrolling to change the selected block, right-click placement, movement, breaking blocks, and digging; jumping is not supported in the first pass.

What does the setup for Python agents look like using OpenRouter and Gemini 2.5 Pro?

The transcript uses Cursor to create a Python file (agents.py) and routes Gemini 2.5 Pro through OpenRouter. Steps include: installing the OpenAI Python package, activating a Conda environment (test), creating an OpenRouter API key, and pasting that key into the code. The test sends a multimodal request (an image from Wikipedia) to verify the model can interpret images and return a descriptive response.

What cost and performance details are provided for the multimodal test?

OpenRouter activity details show the request used model ID “GM25 for preview” (as displayed in the transcript) with a Google AI Studio provider. The first token arrives after about 3.5 seconds, throughput is about 162 tokens per second, total tokens are 523 (with most spent on reasoning), and the completion is about 118 tokens. The final cost is described as 0.001 (effectively next to nothing, framed as a fraction of a penny).

How does the transcript suggest using Gemini 2.5 Pro beyond coding—like task management and agents?

It points to Vectal as a place to select Gemini 2.5 Pro and run AI agents on tasks. The described workflow moves tasks into Vectal, selects a model, and has an agent identify the most important task and help complete it. It also claims Perplexity Pro is built into these agents for web search, reducing the need for multiple subscriptions.

Review Questions

What leaderboard criteria are used to judge web app quality, and how does Gemini 2.5 Pro rank relative to Claude 3.7 Sonnet?
In the Cursor + OpenRouter setup, what are the minimum steps needed to run a multimodal test with Gemini 2.5 Pro?
What limitations show up in the one-shot 2D Minecraft-like demo (and what controls still work)?

Key Points

1
Gemini 2.5 Pro early access emphasizes stronger coding performance for interactive, aesthetically pleasing web apps, not just text output.
2
A webdev arena leaderboard places Gemini 2.5 Pro at the top with a score of 1419, ahead of Claude 3.7 Sonnet (second) and the previous Gemini 2.5 Pro (third).
3
Cursor users are told to expect fewer “failure to call tools” issues with the upgraded Gemini 2.5 Pro, improving agent reliability.
4
A one-shot prompt can generate a playable 2D Minecraft-like clone using HTML and JavaScript, including block placement, breaking, and digging (but not jumping).
5
Gemini 2.5 Pro can be used directly in the Gemini app or via Google AI Studio, with the transcript favoring Gemini’s simpler UI and direct code execution.
6
Python agent building is demonstrated by routing Gemini 2.5 Pro through OpenRouter using an OpenAI-compatible endpoint, requiring an OpenRouter API key and the OpenAI Python package.
7
The multimodal test is presented as fast and extremely low cost per request, with token throughput and total token counts logged in OpenRouter activity.

Highlights

Gemini 2.5 Pro tops the webdev arena leaderboard at 1419, with scoring based on human preference for apps that are both functional and visually appealing.

Cursor’s CEO reports a significant reduction in tool-calling failures that previously affected Gemini 2.5 Pro users.

A single prompt produced a working 2D Minecraft-like game in HTML/JavaScript with WASD movement and block interactions.

The OpenRouter multimodal test returns an accurate image description while logging very low cost and fast token throughput.

Topics

Gemini 2.5 Pro
Interactive Web Apps
Tool Calling
OpenRouter Agents
Multimodal Coding

Mentioned

David Ondrej
WASD
LLM
API
UI
MVP
HTML
CS
JavaScript
Python
CondA
MRR
WASD controllable