Get AI summaries of any video or article — Sign up free
Gemini 3 is THE building Agent! Demos, Hands on with Anti Gravity thumbnail

Gemini 3 is THE building Agent! Demos, Hands on with Anti Gravity

MattVidPro·
6 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Gemini 3 is presented as a multimodal, large-context model (up to a million tokens) that can generate runnable, interactive web experiences from prompts.

Briefing

Gemini 3 is being positioned as a major leap in “agentic” coding and multimodal generation—strong enough that one week of hands-on testing led to a blunt conclusion: it’s outperforming GPT-5 in practical demo quality, especially for code-heavy, interactive results. The model’s headline features include a very large context window (up to a million tokens) and strong multimodal understanding, with the ability to process not just text but also images and videos with audio. Google also frames Gemini 3 as producing more helpful, better-formatted, and more concise responses, and the testing here leans heavily on that claim through rapid, code-first experiments.

The most attention-grabbing demonstrations are interactive web-based creations generated from natural-language prompts. A “realistic water physics” scene lets users drop a lemon into a fully 3D, interactive environment with reflections, waves, and click-to-interact behavior. From there, Gemini 3 generates a full solar system simulation with correct orbital motion, time progression, and a navigable 3D view (including twinkling lights and 3D stars). More ambitious prompts produce playable prototypes: a third-person underwater squid game built in a single HTML file, and a jelly-based physics puzzle platformer that includes deformable physics, particles, CRT-style scanline effects, and multi-level behavior. The squid game is described as sometimes hard to control, but it’s still treated as a working prototype with dynamic tentacles, bioluminescent elements, and an escape mechanic using rechargeable ink.

Coding performance is repeatedly tied to how quickly Gemini 3 can translate prompts into large, runnable codebases. The squid prototype lands around 700–800 lines of HTML/JS, while other demos are shorter but still substantial. The testing also compares Gemini 3 against ChatGPT 5.1: while GPT-5.1 can produce working demos, the results are described as less playable and less graphically faithful, even when similar “shot” counts are used.

Beyond chat, the transcript spotlights Google’s “Anti-gravity,” a new agent-oriented development platform meant to orchestrate higher-level tasks across workspaces. Anti-gravity is described as a “computer use” workflow that can search, plan, and build using Gemini 3 Pro, including unzipping asset packs and generating a full project structure (HTML, CSS, and multiple JS modules). A 2D physics “fire platformer” is created using Creative Commons asset packs; the first attempts hit errors and freezes, including agent termination due to code issues and a hard stop when the game freezes after attempting to burn objects. After iterative debugging and an overhaul plan, the game eventually loads with improved visuals, textures, and camera behavior, though the core “burning/destroying” mechanic still appears incomplete.

The transcript also contrasts agent speed and reliability: browser-based “PC Part Picker” builds are capable but slow, with the agent taking long enough to risk timeouts. Still, Anti-gravity’s workflow—surfacing problems in a “problems” panel and pushing fixes back into the chat—gets praised as a smoother version of the coding-assistant experience.

Community demos extend the theme: 3D voxel scenes from images, Rubik’s cube solvers in 3JS, sign-language recognition using webcam video, and physics simulations like bouncing balls with tunable gravity and rotation. Benchmarks are cited as another pillar of the claim, including references to ARC-AGI 2 and cost/performance comparisons versus GPT-5 variants. Overall, the core takeaway is that Gemini 3 plus Anti-gravity is moving from “generate code” to “build and iterate on working interactive software,” with multimodal context and agent workflows doing much of the heavy lifting—even if speed, edge-case bugs, and some mechanics still require human steering.

Cornell Notes

Gemini 3 is presented as a strong step forward in multimodal understanding and agent-style coding, with a context window up to a million tokens and the ability to work with images and videos (including audio). Hands-on demos emphasize that it can generate large, runnable interactive web projects—often in a single HTML file—such as a 3D water physics scene, a solar system simulation, and playable physics games (squid underwater and jelly platformer). Google’s Anti-gravity platform is highlighted as the practical layer that orchestrates these tasks: it unzips asset packs, plans project structure, writes code, and iterates when errors appear. While agent browser actions can be slow and some mechanics initially fail (freezes, termination errors), iterative fixes eventually produce working games and smoother debugging workflows.

What concrete capabilities make Gemini 3 feel “agentic” rather than just a chat model in these demos?

The transcript repeatedly ties Gemini 3 to code that runs immediately in a browser and to workflows that involve planning, file generation, and interaction. Examples include generating a fully interactive 3D water physics scene (click-to-drop objects), producing a navigable solar system simulation with time progression and orbital motion, and creating playable physics games delivered as simple HTML files. In addition, Anti-gravity is used to orchestrate higher-level tasks—unzipping asset packs, generating project folders (HTML/CSS/JS), and iterating on bugs using an agent workflow rather than only producing static code.

How do the demos demonstrate multimodal strength beyond text-only prompting?

Multimodality shows up in two ways. First, Gemini 3 is described as handling images and videos with audio, which matters for tasks like sign-language recognition using webcam video (from community demos). Second, the Anti-gravity workflow includes uploading screenshots to help the agent debug and proceed when the initial game build fails or freezes. The transcript also claims the model can analyze visual assets (PNG files) to decide what they represent (character, enemy, floor tile), then incorporate them into the generated game.

What kinds of interactive web-game mechanics does Gemini 3 generate, and how reliable are they?

Generated mechanics include physics-based movement, combat/defense actions, resource-like systems (ink reservoir), particle effects, deformable objects (jelly deformation), and level progression. Reliability is mixed: the squid game is described as sometimes “broken” in movement but still playable as a prototype; the jelly platformer works with CRT scanline effects and physics deformation, though platform collisions can cause glitches. The fire platformer initially freezes when attempting to burn boxes and required an overhaul plan before it loaded with improved textures and camera behavior.

How does the transcript compare Gemini 3 to GPT-5.1 for coding demos?

Gemini 3 is portrayed as producing demos with better playability and graphical fidelity. GPT-5.1 is said to be able to make working demonstrations, but not at the same level of interactive quality. The comparison is grounded in similar “shot” prompting and the observation that Gemini 3’s outputs more consistently reach a state that feels playable rather than merely functional.

What role does Anti-gravity play in turning prompts into full projects?

Anti-gravity acts like an agent-powered development environment. It plans the project, requests permissions (including terminal access), unzips provided asset packs, and generates a complete file structure (e.g., index.html, style.css, and multiple JS modules such as physics, renderer, input, and asset manager). When errors occur, it can surface problems in a panel and incorporate fixes back into the chat log, enabling iterative repair rather than starting over from scratch.

Why do browser-based agent tasks (like building a PC list) struggle in this account?

The transcript attributes the slowdown to the agent’s navigation and research pace. In the PC Part Picker demo, Gemini 3’s agent opens the site and begins selecting parts but takes a long time, with uncertainty about whether it’s thinking or simply slow. The agent also stops before producing a fully ready-to-use link, suggesting timeouts, budget constraints, or difficulty finding affordable components.

Review Questions

  1. Which Gemini 3 demo outputs are described as fully interactive in-browser experiences, and what specific interaction (clicking, movement, time controls) does each one include?
  2. How does Anti-gravity’s workflow handle errors during project generation, and what evidence suggests it can iterate rather than only generate once?
  3. What differences in playability and fidelity are claimed between Gemini 3 and GPT-5.1, and which examples are used to support that comparison?

Key Points

  1. 1

    Gemini 3 is presented as a multimodal, large-context model (up to a million tokens) that can generate runnable, interactive web experiences from prompts.

  2. 2

    Hands-on demos emphasize 3D physics and interactive mechanics, including click-to-interact water simulations and a solar system simulation with orbital motion and time progression.

  3. 3

    Gemini 3 can produce sizable single-file web game prototypes (often hundreds of lines of code), including a third-person squid game and a deformable jelly physics platformer.

  4. 4

    Anti-gravity is positioned as the practical agent layer that plans projects, unzips asset packs, writes full file structures, and iterates when bugs appear.

  5. 5

    Browser-based “computer use” tasks can be capable but slow, with research/navigation taking long enough to risk incomplete outputs.

  6. 6

    Compared with GPT-5.1, Gemini 3 is described as producing demos with better playability and graphical fidelity, even when GPT-5.1 can still reach a working state.

  7. 7

    Community examples extend the model’s reach to 3D generation, webcam-based sign-language recognition, and interactive physics simulations, reinforcing the coding-and-creation theme.

Highlights

Gemini 3 generated a fully interactive 3D water physics scene where users can click anywhere to drop a lemon, with reflections and wave behavior.
A solar system simulation was produced with correct orbital motion, time tracking (days since start), and navigable 3D views including twinkling lights and 3D stars.
Anti-gravity successfully created a full 2D physics fire platformer project from asset packs—after initial freezes and agent termination errors—showing an agent-driven build-and-debug workflow.

Topics

Mentioned