Get AI summaries of any video or article — Sign up free
AI Weather Warning: Gemini 3, K2 Thinking, TTS, & more! thumbnail

AI Weather Warning: Gemini 3, K2 Thinking, TTS, & more!

MattVidPro·
6 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

ChatGPT adds the ability to pause long-running prompts and insert new context without restarting, enabling iterative deep research.

Briefing

AI product updates are accelerating across text, video, audio, and even game-like simulation—while a major legal ruling and a new open-source “thinking” model reshape what’s feasible and what’s defensible.

The biggest near-term software change lands inside ChatGPT: users can pause long-running queries and inject new context without restarting or losing progress. The workflow described is straightforward—send a prompt, let the model begin “thinking,” then use stop/update controls to add requirements midstream. That matters because it targets a common failure mode in deep research: forgetting a constraint, adding a new angle, or refining the goal after the model has already started. The transcript also flags a potential rollout mismatch—some users with ChatGPT Plus reportedly don’t see the pause-and-update controls yet—so access timing may be uneven.

On the AI video front, Sora’s updates focus on both social engagement and consumption limits. A daily leaderboard ranks creations by categories like “cameo,” “remixed,” and “characters,” with examples attributed to recognizable creator names. Access expansion to more countries and the Android app’s arrival are framed as the reason for the push toward broader participation. Sora also introduces clearer accounting for remaining “video generations,” plus paid top-ups: $10 for 25 generations, $20 for 50, and $40 for 100. The transcript notes that free caps appear more restrictive than before, but the ability to buy more generations directly is positioned as a practical fix.

A separate, high-stakes development comes from Stability AI’s copyright fight against Getty Images. The court dismissed secondary infringement claims tied to Stable Diffusion training, finding no reproduction of Getty images and model weights or outputs under the relevant CDPA sections. The ruling hinges on the idea that models don’t store actual copies of images; instead, they learn distributions—described in lay terms as “intrinsic values” rather than memorized pixels. Trademark claims succeeded only narrowly, specifically around synthetic watermarks in older model versions. Overall, the decision is portrayed as a meaningful win for developers facing similar litigation.

The transcript then pivots to open-source performance: “Kimmy K2 thinking,” an open model from China, posts benchmark results that reportedly beat GPT5 and Claude 4.5 Sonnet on multiple tests while costing less than Sonnet. It’s framed as a “thinking” model with tool use, showing strong results on Humanity’s Last Exam (text-only), SEAL Zero (information collection), and coding benchmarks like Live Codebench and SWE Verified—though the coding crown is said to belong to GPT5. The model’s appeal is also practical: code and weights are available on Hugging Face, enabling community fine-tuning and optimization for consumer hardware.

Finally, the Gemini 3 “storm” is treated as imminent, with multiple demos aimed at showing capability. Examples include LLM-driven music generation (“Ascension Protocol”), a 3D planet visualizer with adjustable topology and atmosphere, and even code-based simulation of a Nintendo Switch-like interface and simple games. Alongside Gemini 3, “Nano Banana 2.0” (code name “Gem Pix 2”) is expected to improve reference handling for image-to-video generation, plus an open-source multi-angle “Laura” for rotating images while preserving fine details. Other updates include VO3.1 gaining camera adjustment controls and XAI’s Gro video model demonstrating spatial problem-solving in a maze. The throughline: models are getting more controllable, more interactive, and more legally survivable—at the same time that competition and safety concerns remain unresolved.

Cornell Notes

ChatGPT is adding a workflow for long-running prompts: users can pause generation and add new context without restarting, which would make deep research and iterative requirements far less error-prone. Sora is expanding with a daily leaderboard, clearer limits on remaining “video generations,” and paid top-ups to continue generating when free quotas run out. Stability AI’s copyright case against Getty Images largely ends in Stability’s favor, with the court rejecting secondary infringement claims because models learn distributions rather than storing copies of images. Open-source “Kimmy K2 thinking” posts strong benchmark results against major closed models while remaining cheaper, with code and weights available on Hugging Face. Meanwhile, Gemini 3 and “Nano Banana 2.0” are teased through demos emphasizing music, 3D visuals, and controllable image/video generation.

What change in ChatGPT directly targets the “I forgot something” problem during long research prompts?

ChatGPT introduces controls to interrupt extended thinking and add new context without restarting or losing progress. The described flow: send a prompt, let the model begin thinking, then use stop/update symbols to pause and insert additional requirements. That’s meant to support iterative refinement—e.g., adding a constraint or changing the goal after the model has already started—without discarding earlier work. The transcript also notes some users may not yet see the controls even with ChatGPT Plus, suggesting rollout may be account-dependent.

How does Sora’s new leaderboard and generation accounting change user behavior?

Sora adds a daily leaderboard that ranks creations by categories such as “cameo,” “remixed,” and “characters,” with examples attributed to specific creators. That social layer is paired with clearer generation limits: the app and website show how many video generations remain and when more will be available. When free generations run out, users can pay to buy more—$10 for 25, $20 for 50, and $40 for 100—turning generation usage into a more predictable, monetizable workflow.

Why was the Getty Images vs. Stability AI ruling considered a major win for AI developers?

The court dismissed secondary infringement claims tied to Stable Diffusion training, finding no reproduction of Getty images and no infringing model weights or outputs under the cited CDPA sections. A key reasoning point: for secondary infringement to hold, the models would need to store actual copies of images, not merely learn distributions. The transcript summarizes this as learning intrinsic values that characterize images rather than memorizing and storing the images themselves. Trademark claims still succeeded narrowly for synthetic watermarks in older model versions.

What makes “Kimmy K2 thinking” notable beyond raw benchmark numbers?

It’s positioned as an open-source “thinking” model with tool use, posting benchmark results that reportedly beat GPT5 and Claude 4.5 Sonnet on several tests while costing less than Sonnet. Just as important, the transcript emphasizes transparency and accessibility: code and weights are available on Hugging Face, enabling the community to optimize, trim for consumer hardware, and learn from the model’s construction. The tradeoff noted is usability out of the box—open models may require more setup than closed systems.

Which Gemini 3 demos are used to argue the model can handle more than text generation?

Multiple demos aim to show end-to-end generation from prompts: an LLM-driven music generator (“Ascension Protocol”) that produces piano-like compositions; a 3D planet visualizer with adjustable topology, atmosphere density/color, ice caps, and surface roughness; and a code-based simulation of a Nintendo Switch-like interface that includes controller layout, button/stick placement, touchscreen browsing, and simple game behavior (with noted bugs/glitches). The transcript also claims Gemini 3 can simulate a rudimentary game console and generate interactive elements from code.

Review Questions

  1. Which specific capability in ChatGPT reduces the cost of changing requirements mid-generation, and why does it matter for deep research workflows?
  2. What legal distinction did the court rely on in the Getty Images case—copies versus distributions—and how did that affect the outcome?
  3. How do open-source models like Kimmy K2 thinking change the competitive landscape compared with closed models, based on the transcript’s discussion of benchmarks and accessibility?

Key Points

  1. 1

    ChatGPT adds the ability to pause long-running prompts and insert new context without restarting, enabling iterative deep research.

  2. 2

    Sora introduces a daily leaderboard and clearer “video generations” limits, plus paid top-ups when free quotas end.

  3. 3

    Stability AI largely wins Getty Images’ secondary copyright infringement claims because training learns distributions rather than storing image copies; trademark wins were limited to synthetic watermarks in older models.

  4. 4

    Kimmy K2 thinking is an open-source, tool-using “thinking” model that posts strong benchmark results against major closed models while remaining cheaper.

  5. 5

    Gemini 3 is supported by demos spanning music generation, adjustable 3D planet visualization, and code-based simulation of a Nintendo Switch-like interface.

  6. 6

    Nano Banana 2.0 (“Gem Pix 2”) is expected to improve reference adherence and the ability to create entirely new scenes from image inputs.

  7. 7

    VO3.1 adds camera adjustment controls (position and motion), while XAI’s Gro video model demonstrates spatial problem-solving in a maze scenario.

Highlights

ChatGPT’s new pause-and-update workflow targets a real research pain point: changing requirements midstream without losing progress.
The Getty Images ruling turns on a crucial technical/legal line—models learning distributions instead of storing copies—leading to dismissal of secondary infringement claims.
Kimmy K2 thinking pairs strong benchmark performance with open code and weights on Hugging Face, inviting community optimization.
Sora’s generation limits are now explicit and monetized, with straightforward pricing for additional video generations.
Gemini 3 demos go beyond visuals into code-driven simulation, including a rudimentary Nintendo Switch-like UI and simple game behavior.

Topics

  • ChatGPT Updates
  • Sora Leaderboard
  • Stability AI Getty Ruling
  • Kimmy K2 Thinking
  • Gemini 3 Demos
  • Nano Banana 2.0
  • Open-Source Models