Introducing Gemini 3.1 Pro
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Gemini 3.1 Pro is a first “0.1” step in the Gemini versioning pattern and is positioned as a substantive update rather than a dated preview.
Briefing
Google is rolling out Gemini 3.1 Pro, a “0.1” update that marks a noticeable jump in reasoning and benchmark performance—and, crucially, brings finer control over how long the model thinks before answering. The release matters because it closes a performance gap against top competitors while also giving users a practical knob (thinking level) to trade latency for deeper problem-solving.
The update’s most distinctive signal is versioning itself: Gemini has previously moved in 0.5 increments, making Gemini 3.1 Pro the first time the lineup has used a 0.1 step. That small numbering change aligns with a larger technical shift. Benchmarks show the biggest gains when Gemini 3.1 Pro is compared directly to Gemini 3 Pro, including a substantial improvement on tasks tied to “humanities last exam,” where the jump is described as far larger than what other model comparisons alone would suggest. The same pattern appears in ARC-AGI-style evaluation, where Gemini 3.1 Pro posts a much higher score than Gemini 3 Pro.
A key theme behind those gains is reinforcement-learning (RL) training for environments that resemble real-world problem solving—coding and agentic workflows rather than static question answering. The transcript points to RL environments as the likely driver, noting that improvements show up across coding-oriented benchmarks and agent-style setups such as “MCP atlas,” which evokes multi-step tool use and search. Even design generation tasks show better output quality, with examples described as more capable at producing structured graphics and coding-related artifacts.
Beyond benchmarks, Gemini 3.1 Pro introduces a user-facing capability: multiple “thinking levels.” Where Gemini 3 Pro offered only low or high, Gemini 3.1 Pro adds a medium option and supports a spectrum from quick responses to multi-minute deliberation. In a live test using an International Math Olympiad problem, setting thinking to high produced the correct answer but required a long wait—over eight minutes—yet the transcript frames that as roughly half the time previously required by “deep think” behavior. The practical takeaway is that high thinking effectively behaves like a “mini” version of Gemini DeepThink, delivering deeper reasoning at the cost of latency.
When thinking is set to low, the model responds faster but may fail on harder reasoning tasks, reinforcing the idea that the thinking-level control is not cosmetic—it changes outcome quality.
For creative and technical use, the transcript also highlights SVG generation, with an example prompt (“a cat riding a bicycle”) producing a rendered SVG that includes recognizable elements like a bicycle, chain, and pedal alignment. For hands-on access, Gemini 3.1 Pro is rolling out across Google apps that use Gemini models, is available on Google Cloud, and can be tried in AI Studio.
Overall, Gemini 3.1 Pro is positioned as an incremental version bump with competitive implications: it aims to return Gemini 3’s performance to the same tier as Opus 4.6 and the latest GPT models, and it raises the likelihood of follow-on updates from other model providers as the ecosystem races to keep up.
Cornell Notes
Gemini 3.1 Pro is a “0.1” update that delivers a clear performance bump over Gemini 3 Pro, especially on reasoning- and RL-environment-linked benchmarks like ARC-AGI-style tasks and coding/agentic evaluations. The practical differentiator is a new thinking-level control: users can choose low, medium, or high, which changes both latency and accuracy. In a math problem example, high thinking took over eight minutes but produced the correct answer, while low thinking was faster yet missed the solution. The transcript also notes improved generation quality for structured outputs like SVGs and coding-related tasks. The model is rolling out broadly across Google apps, available on Google Cloud, and testable in AI Studio.
What makes Gemini 3.1 Pro stand out compared with earlier Gemini 3 releases?
How do thinking levels change what users get from Gemini 3.1 Pro?
Why are RL environments mentioned as a driver of the benchmark improvements?
What evidence is given that Gemini 3.1 Pro improves both reasoning and generation tasks?
Where can someone try Gemini 3.1 Pro?
Review Questions
- When would a user prefer low vs high thinking levels in Gemini 3.1 Pro, based on the math example?
- Which benchmark categories (reasoning, coding, agentic workflows, design/SVG) are cited as improving, and what training approach is suggested as the reason?
- What does the transcript imply about how competitive pressure might affect other model providers after Gemini 3.1 Pro’s release?
Key Points
- 1
Gemini 3.1 Pro is a first “0.1” step in the Gemini versioning pattern and is positioned as a substantive update rather than a dated preview.
- 2
Benchmark gains are framed as most meaningful when compared directly to Gemini 3 Pro, with large jumps on reasoning-style evaluations.
- 3
Reinforcement-learning in RL-style environments is presented as a likely driver of improved performance on coding and agentic tasks.
- 4
Gemini 3.1 Pro adds thinking-level control (low, medium, high), letting users trade latency for deeper reasoning.
- 5
In a hard math example, high thinking produced the correct answer but required over eight minutes, while low thinking was faster but incorrect.
- 6
The model shows improved structured generation capability, including SVG outputs with multiple accurate elements.
- 7
Gemini 3.1 Pro is rolling out across Google apps, available on Google Cloud, and testable in AI Studio.