Introducing Gemini 3.1 Pro

TL;DR

Gemini 3.1 Pro is a first “0.1” step in the Gemini versioning pattern and is positioned as a substantive update rather than a dated preview.

Briefing Cornell Notes

Briefing

Google is rolling out Gemini 3.1 Pro, a “0.1” update that marks a noticeable jump in reasoning and benchmark performance—and, crucially, brings finer control over how long the model thinks before answering. The release matters because it closes a performance gap against top competitors while also giving users a practical knob (thinking level) to trade latency for deeper problem-solving.

The update’s most distinctive signal is versioning itself: Gemini has previously moved in 0.5 increments, making Gemini 3.1 Pro the first time the lineup has used a 0.1 step. That small numbering change aligns with a larger technical shift. Benchmarks show the biggest gains when Gemini 3.1 Pro is compared directly to Gemini 3 Pro, including a substantial improvement on tasks tied to “humanities last exam,” where the jump is described as far larger than what other model comparisons alone would suggest. The same pattern appears in ARC-AGI-style evaluation, where Gemini 3.1 Pro posts a much higher score than Gemini 3 Pro.

A key theme behind those gains is reinforcement-learning (RL) training for environments that resemble real-world problem solving—coding and agentic workflows rather than static question answering. The transcript points to RL environments as the likely driver, noting that improvements show up across coding-oriented benchmarks and agent-style setups such as “MCP atlas,” which evokes multi-step tool use and search. Even design generation tasks show better output quality, with examples described as more capable at producing structured graphics and coding-related artifacts.

Beyond benchmarks, Gemini 3.1 Pro introduces a user-facing capability: multiple “thinking levels.” Where Gemini 3 Pro offered only low or high, Gemini 3.1 Pro adds a medium option and supports a spectrum from quick responses to multi-minute deliberation. In a live test using an International Math Olympiad problem, setting thinking to high produced the correct answer but required a long wait—over eight minutes—yet the transcript frames that as roughly half the time previously required by “deep think” behavior. The practical takeaway is that high thinking effectively behaves like a “mini” version of Gemini DeepThink, delivering deeper reasoning at the cost of latency.

When thinking is set to low, the model responds faster but may fail on harder reasoning tasks, reinforcing the idea that the thinking-level control is not cosmetic—it changes outcome quality.

For creative and technical use, the transcript also highlights SVG generation, with an example prompt (“a cat riding a bicycle”) producing a rendered SVG that includes recognizable elements like a bicycle, chain, and pedal alignment. For hands-on access, Gemini 3.1 Pro is rolling out across Google apps that use Gemini models, is available on Google Cloud, and can be tried in AI Studio.

Overall, Gemini 3.1 Pro is positioned as an incremental version bump with competitive implications: it aims to return Gemini 3’s performance to the same tier as Opus 4.6 and the latest GPT models, and it raises the likelihood of follow-on updates from other model providers as the ecosystem races to keep up.

Cornell Notes

Gemini 3.1 Pro is a “0.1” update that delivers a clear performance bump over Gemini 3 Pro, especially on reasoning- and RL-environment-linked benchmarks like ARC-AGI-style tasks and coding/agentic evaluations. The practical differentiator is a new thinking-level control: users can choose low, medium, or high, which changes both latency and accuracy. In a math problem example, high thinking took over eight minutes but produced the correct answer, while low thinking was faster yet missed the solution. The transcript also notes improved generation quality for structured outputs like SVGs and coding-related tasks. The model is rolling out broadly across Google apps, available on Google Cloud, and testable in AI Studio.

What makes Gemini 3.1 Pro stand out compared with earlier Gemini 3 releases?

It’s the first Gemini update using a 0.1 version step (Gemini previously moved in 0.5 increments). More importantly, the transcript links the change to meaningful benchmark gains over Gemini 3 Pro, suggesting technology carried over from Gemini DeepThink-style approaches—especially for reasoning and RL-trained environments.

How do thinking levels change what users get from Gemini 3.1 Pro?

Gemini 3.1 Pro adds a thinking-level setting with low, medium, and high options. High thinking can take several minutes (over eight minutes in the math example) and tends to improve correctness on difficult reasoning tasks. Low thinking responds faster but can fail on the same hard problem, showing the setting directly affects outcome quality, not just speed.

Why are RL environments mentioned as a driver of the benchmark improvements?

The transcript argues that better results track with training that uses reinforcement learning in environments resembling real problem-solving—coding benchmarks and agentic workflows. Examples referenced include coding-oriented evaluations and “MCP atlas,” which implies tool-using, multi-step search/agent behavior.

What evidence is given that Gemini 3.1 Pro improves both reasoning and generation tasks?

Benchmarks show large gains over Gemini 3 Pro on reasoning-style tests (e.g., humanities last exam and ARC-AGI-style scoring). Separately, generation examples include SVG creation (a cat riding a bicycle) where the output includes multiple recognizable structural details, and coding/design-related tasks described as improved versus Gemini 3.

Where can someone try Gemini 3.1 Pro?

The transcript says it’s rolling out to most Google apps that use a Gemini model, is already available on Google Cloud, and can be tested for free in AI Studio. It also emphasizes experimenting with different thinking levels to match the task’s difficulty.

Review Questions

When would a user prefer low vs high thinking levels in Gemini 3.1 Pro, based on the math example?
Which benchmark categories (reasoning, coding, agentic workflows, design/SVG) are cited as improving, and what training approach is suggested as the reason?
What does the transcript imply about how competitive pressure might affect other model providers after Gemini 3.1 Pro’s release?

Key Points

1
Gemini 3.1 Pro is a first “0.1” step in the Gemini versioning pattern and is positioned as a substantive update rather than a dated preview.
2
Benchmark gains are framed as most meaningful when compared directly to Gemini 3 Pro, with large jumps on reasoning-style evaluations.
3
Reinforcement-learning in RL-style environments is presented as a likely driver of improved performance on coding and agentic tasks.
4
Gemini 3.1 Pro adds thinking-level control (low, medium, high), letting users trade latency for deeper reasoning.
5
In a hard math example, high thinking produced the correct answer but required over eight minutes, while low thinking was faster but incorrect.
6
The model shows improved structured generation capability, including SVG outputs with multiple accurate elements.
7
Gemini 3.1 Pro is rolling out across Google apps, available on Google Cloud, and testable in AI Studio.

Highlights

High thinking in Gemini 3.1 Pro behaves like a “mini” DeepThink: slower, but far more reliable on difficult reasoning tasks.

The new low/medium/high thinking-level control is treated as a real quality lever, not a cosmetic setting.

The biggest performance story is the jump over Gemini 3 Pro, tied to RL-trained environments for coding and agentic workflows.

Gemini 3.1 Pro is already rolling out broadly and is accessible via Google Cloud and AI Studio.

Topics

Gemini 3.1 Pro
Thinking Levels
RL Training
Benchmarks
SVG Generation

Mentioned

Google
Gemini
Gemini DeepThink
Google Cloud
AI Studio
MCP atlas
Opus 4.6
Sonnet 4.6
GPT
Sam Witteveen