Google won. (Gemini 2.5 Pro is INSANE)

TL;DR

Gemini 2.5 Pro is framed as a major upgrade in built-in reasoning, with faster responses and strong benchmark performance after release.

Briefing Cornell Notes

Briefing

Gemini 2.5 Pro is being positioned as a major step forward in “thinking” AI—delivering faster responses and strong benchmark performance while Google pushes more reasoning capability directly into the model rather than bolting it on after the fact. Early results show it moving to the top of popular leaderboards immediately after release, including beating well-known competitors such as GPT-4.5 and DeepSeek on benchmark-style comparisons. The practical takeaway is that the model is not just smarter on paper; it’s also fast enough and cost-lean enough to be usable as a default in real workflows.

A key reason for the excitement is how Google is packaging reasoning. Earlier “thinking” features in Google models were described as patched-on behavior—prompting the model to think, then returning an answer after a follow-up. With Gemini 2.5 Pro, the thinking behavior is treated as a baked-in capability, designed to handle more complex prompts and support context-aware agent behavior. That shift matters because it changes how reliably the model can follow multi-step instructions and operate under richer context.

The transcript also ties performance to concrete product implications. In T3 Chat, Gemini models are already being added early, with rate limits described as high enough that pro subscribers should not hit frequent throttling. The creator expects pricing to be competitive as well, noting that Google has confirmed Pro models will be available on paid tiers soon—an important operational detail because paid access is typically what keeps rate limits and service availability stable.

Benchmark highlights include strong performance on math and notable wins on an exam where OpenAI had built the benchmark itself—framing it as a rare case of a company outperforming another on the latter’s own test. Coding, however, remains mixed: thinking models can feel worse in editor workflows because they may take longer to respond, and the added internal context can sometimes lead to “self-gaslighting” errors—examples include a physics-style prompt where the model’s reasoning produces an incorrect outcome (like gravity behaving the wrong way) and a “ball in a hexagon” coding/logic test where it performs well until a corner case causes the ball to fall out.

Beyond model quality, the transcript argues that Google’s advantage is structural. Model performance is framed as a function of three inputs: data, hardware, and science. Google is portrayed as unusually strong across all three—especially because it builds custom hardware and can tightly integrate model training and inference with that infrastructure. The discussion contrasts this with other AI companies that may excel in only one area (science, data, or hardware) and must partner externally for the rest.

The transcript also emphasizes context length and multimodal tooling as differentiators. Gemini’s 1 million token input window is highlighted as “nuts,” enabling tasks that would be impossible with smaller context limits. Real examples include extracting dozens of YouTube and SoundCloud links from a large, messy blog page by feeding the model a huge HTML blob and having it generate executable JavaScript to pull the links. Native PDF support is also praised, including parsing images, graphs, and charts from PDFs.

Finally, the creator flags a limitation: the “thinking” traces seen in Google’s AI Studio are not exposed through the API, meaning developers using API access may not get the same visible reasoning data—only the final response after a wait. Even with that caveat, Gemini 2.5 Pro is described as a compelling combination of speed, reasoning capability, and practical tooling, reinforcing the claim that Google is well positioned to lead the AI race.

Cornell Notes

Gemini 2.5 Pro is presented as a leap in “thinking” AI: reasoning capability is treated as a built-in model feature rather than a prompt-time add-on. Early leaderboard results and benchmark comparisons are described as strong, including wins against major competitors and improved math performance. In practical use, the model’s speed and cost are framed as good enough to be usable as a default in tools like T3 Chat, with pro subscribers getting access to newly added models. The transcript also highlights Google’s differentiators beyond raw intelligence—especially a 1 million token context window, native PDF support, and multimodal capabilities. A key caveat: the detailed thinking traces visible in AI Studio are not available via API output, limiting what developers can inspect.

What changes with Gemini 2.5 Pro’s “thinking” compared with earlier approaches?

The transcript contrasts earlier Google “thinking” behavior—prompting the model to think, then returning an answer after a reprompt—with Gemini 2.5 Pro, where thinking is described as a proper baked-in capability. That design is meant to handle more complex prompts and support more capable context-aware agents, rather than relying on a patched workflow.

Why does the transcript treat speed and cost as central, not just benchmark scores?

Gemini 2.5 Pro is described as clearing benchmarks while also delivering fast responses. That combination matters for real products: the creator says Gemini models were made defaults in T3 Chat because they’re “so good and so cheap,” and they expect Pro pricing to land on paid tiers soon. The operational logic is that paid access helps avoid aggressive rate limiting that can happen with free/non-paid models.

Where does Gemini 2.5 Pro look strongest, and where does it still struggle?

Math performance is highlighted as a notable strength, with comparisons suggesting other models can be weaker at math. Coding is more mixed: thinking models can feel slower in editor workflows, and the transcript gives examples of reasoning-driven failures—such as a physics-style prompt where gravity is inverted, and a “ball in a hexagon” test where the model performs until a corner case causes the ball to fall out.

How does the 1 million token context window translate into real tasks?

The transcript claims the 1 million token window enables workflows that smaller context limits can’t. A concrete example: feeding a large HTML blob from a music blog containing many YouTube and SoundCloud embeds, then having the model generate JavaScript to extract all iframe links. The model is said to correctly identify 61 iframes and extract sources pointing to YouTube/SoundCloud.

What does the transcript say about why Google is ahead—data, science, or hardware?

It frames model capability as driven by three inputs: data, hardware, and science. Google is portrayed as unusually strong across all three—especially because it builds custom hardware and can integrate it tightly with its models. The transcript argues that other companies tend to lead in one area and rely on partners for the rest, which limits synergy.

What limitation affects developers using Gemini via API rather than AI Studio?

The transcript reports that the detailed “thinking” traces visible in AI Studio are not available through the API output. API users get the final response after waiting, without the intermediate reasoning data, and the creator calls for Google to expose that information.

Review Questions

How does embedding “thinking” directly into the model change expected behavior compared with prompt-time thinking workflows?
What kinds of tasks benefit most from a 1 million token context window, and what example from the transcript illustrates that?
Why might a thinking model be less satisfying in coding editors even when it performs well on benchmarks?

Key Points

1
Gemini 2.5 Pro is framed as a major upgrade in built-in reasoning, with faster responses and strong benchmark performance after release.
2
Google’s “thinking” approach shifts from prompt-time prompting to a baked-in model capability intended for complex prompts and context-aware agents.
3
T3 Chat is adding Gemini models early, with rate limits described as high enough for pro subscribers, and pricing expected to land on paid tiers soon.
4
Gemini 2.5 Pro shows strong math performance, but coding reliability can still break on corner cases—especially when internal reasoning creates plausible but wrong outcomes.
5
Google’s differentiators include a 1 million token context window, native PDF parsing (including images/graphs/charts), and multimodal features like image parsing/editing.
6
A key developer caveat: AI Studio exposes thinking traces, but the API output reportedly does not include that reasoning data.
7
The transcript argues Google’s advantage comes from unusually strong alignment across data, science, and custom hardware, reducing dependence on external partners.

Highlights

Gemini 2.5 Pro is described as moving to the top of leaderboards immediately after launch, combining speed with strong benchmark results.

The 1 million token context window is used to justify large-scale extraction tasks—like generating JavaScript to pull dozens of YouTube/SoundCloud links from messy HTML.

Thinking models can fail in instructive ways: they may reason confidently into incorrect physics behavior or drop the ball in a hexagon test at a corner case.

Google’s edge is attributed to tight integration of data, science, and custom hardware, enabling better performance-per-dollar than competitors relying on external stacks.

AI Studio shows reasoning traces, but API users reportedly don’t get that intermediate thinking output—only the final answer.

Topics

Gemini 2.5 Pro
Thinking Models
Context Window
AI Benchmarks
Custom Hardware

Mentioned

Gemini
T3 Chat
VS Code
GCP
Google Cloud Platform
Google Cloud Run
NVIDIA
CUDA
Grok
Scale AI
Dataiku
Cursor
Claude
DeepSeek
Llama
Quen
Ollama
Azure
AWS
Theo - t3․gg
Christian
Flavio
LM Arena
API
PDF
HTML
JS
VS Code
GCP
CUDA