Gemini 3 is THE building Agent! Demos, Hands on with Anti Gravity
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Gemini 3 is presented as a multimodal, large-context model (up to a million tokens) that can generate runnable, interactive web experiences from prompts.
Briefing
Gemini 3 is being positioned as a major leap in “agentic” coding and multimodal generation—strong enough that one week of hands-on testing led to a blunt conclusion: it’s outperforming GPT-5 in practical demo quality, especially for code-heavy, interactive results. The model’s headline features include a very large context window (up to a million tokens) and strong multimodal understanding, with the ability to process not just text but also images and videos with audio. Google also frames Gemini 3 as producing more helpful, better-formatted, and more concise responses, and the testing here leans heavily on that claim through rapid, code-first experiments.
The most attention-grabbing demonstrations are interactive web-based creations generated from natural-language prompts. A “realistic water physics” scene lets users drop a lemon into a fully 3D, interactive environment with reflections, waves, and click-to-interact behavior. From there, Gemini 3 generates a full solar system simulation with correct orbital motion, time progression, and a navigable 3D view (including twinkling lights and 3D stars). More ambitious prompts produce playable prototypes: a third-person underwater squid game built in a single HTML file, and a jelly-based physics puzzle platformer that includes deformable physics, particles, CRT-style scanline effects, and multi-level behavior. The squid game is described as sometimes hard to control, but it’s still treated as a working prototype with dynamic tentacles, bioluminescent elements, and an escape mechanic using rechargeable ink.
Coding performance is repeatedly tied to how quickly Gemini 3 can translate prompts into large, runnable codebases. The squid prototype lands around 700–800 lines of HTML/JS, while other demos are shorter but still substantial. The testing also compares Gemini 3 against ChatGPT 5.1: while GPT-5.1 can produce working demos, the results are described as less playable and less graphically faithful, even when similar “shot” counts are used.
Beyond chat, the transcript spotlights Google’s “Anti-gravity,” a new agent-oriented development platform meant to orchestrate higher-level tasks across workspaces. Anti-gravity is described as a “computer use” workflow that can search, plan, and build using Gemini 3 Pro, including unzipping asset packs and generating a full project structure (HTML, CSS, and multiple JS modules). A 2D physics “fire platformer” is created using Creative Commons asset packs; the first attempts hit errors and freezes, including agent termination due to code issues and a hard stop when the game freezes after attempting to burn objects. After iterative debugging and an overhaul plan, the game eventually loads with improved visuals, textures, and camera behavior, though the core “burning/destroying” mechanic still appears incomplete.
The transcript also contrasts agent speed and reliability: browser-based “PC Part Picker” builds are capable but slow, with the agent taking long enough to risk timeouts. Still, Anti-gravity’s workflow—surfacing problems in a “problems” panel and pushing fixes back into the chat—gets praised as a smoother version of the coding-assistant experience.
Community demos extend the theme: 3D voxel scenes from images, Rubik’s cube solvers in 3JS, sign-language recognition using webcam video, and physics simulations like bouncing balls with tunable gravity and rotation. Benchmarks are cited as another pillar of the claim, including references to ARC-AGI 2 and cost/performance comparisons versus GPT-5 variants. Overall, the core takeaway is that Gemini 3 plus Anti-gravity is moving from “generate code” to “build and iterate on working interactive software,” with multimodal context and agent workflows doing much of the heavy lifting—even if speed, edge-case bugs, and some mechanics still require human steering.
Cornell Notes
Gemini 3 is presented as a strong step forward in multimodal understanding and agent-style coding, with a context window up to a million tokens and the ability to work with images and videos (including audio). Hands-on demos emphasize that it can generate large, runnable interactive web projects—often in a single HTML file—such as a 3D water physics scene, a solar system simulation, and playable physics games (squid underwater and jelly platformer). Google’s Anti-gravity platform is highlighted as the practical layer that orchestrates these tasks: it unzips asset packs, plans project structure, writes code, and iterates when errors appear. While agent browser actions can be slow and some mechanics initially fail (freezes, termination errors), iterative fixes eventually produce working games and smoother debugging workflows.
What concrete capabilities make Gemini 3 feel “agentic” rather than just a chat model in these demos?
How do the demos demonstrate multimodal strength beyond text-only prompting?
What kinds of interactive web-game mechanics does Gemini 3 generate, and how reliable are they?
How does the transcript compare Gemini 3 to GPT-5.1 for coding demos?
What role does Anti-gravity play in turning prompts into full projects?
Why do browser-based agent tasks (like building a PC list) struggle in this account?
Review Questions
- Which Gemini 3 demo outputs are described as fully interactive in-browser experiences, and what specific interaction (clicking, movement, time controls) does each one include?
- How does Anti-gravity’s workflow handle errors during project generation, and what evidence suggests it can iterate rather than only generate once?
- What differences in playability and fidelity are claimed between Gemini 3 and GPT-5.1, and which examples are used to support that comparison?
Key Points
- 1
Gemini 3 is presented as a multimodal, large-context model (up to a million tokens) that can generate runnable, interactive web experiences from prompts.
- 2
Hands-on demos emphasize 3D physics and interactive mechanics, including click-to-interact water simulations and a solar system simulation with orbital motion and time progression.
- 3
Gemini 3 can produce sizable single-file web game prototypes (often hundreds of lines of code), including a third-person squid game and a deformable jelly physics platformer.
- 4
Anti-gravity is positioned as the practical agent layer that plans projects, unzips asset packs, writes full file structures, and iterates when bugs appear.
- 5
Browser-based “computer use” tasks can be capable but slow, with research/navigation taking long enough to risk incomplete outputs.
- 6
Compared with GPT-5.1, Gemini 3 is described as producing demos with better playability and graphical fidelity, even when GPT-5.1 can still reach a working state.
- 7
Community examples extend the model’s reach to 3D generation, webcam-based sign-language recognition, and interactive physics simulations, reinforcing the coding-and-creation theme.