Get AI summaries of any video or article — Sign up free
SO MUCH AI NEWS! 60s AI Video, Full body AI Acting, & Open Source Slam Dunks! thumbnail

SO MUCH AI NEWS! 60s AI Video, Full body AI Acting, & Open Source Slam Dunks!

MattVidPro·
5 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

OpenAI’s ChatGPT agent is positioned as a computer-using, code-writing system that can build file systems, with access starting for Pro and expanding to Plus on Monday.

Briefing

AI agents are moving from “chat” to “do,” with OpenAI’s new ChatGPT agent positioning itself as a near-human performer on white-collar tasks—using a computer, writing code, and building file systems. Access is rolling out first for ChatGPT Pro users, while ChatGPT Plus users begin getting it Monday; free users are still left out. The pitch is that this is not just another language model, but an agentic system that can operate in real workflows, and early benchmark references place it near top-tier closed models on demanding evaluations.

Open-source momentum is matching that shift, and several releases aim to make agentic capability cheaper and more customizable. Moonshot AI’s Kimmy K2 is billed as a major open model: 1 trillion total parameters with a 32B active model, strong performance on SWE-bench verified, and particular emphasis on coding and agent-like tasks. The weights and code are available on Hugging Face and GitHub, and the ecosystem is framed as “open and accessible” enough to compete with closed models for builders—especially because it can be modified and fine-tuned for targeted agent categories. In parallel, an open research-style demo is used to generate a long, citation-heavy report (a “top 50 snack foods” list), reinforcing the claim that the model can handle multi-step research with follow-up questioning.

AWS is also pushing agent infrastructure into mainstream deployment. Amazon Bedrock Agent Core is presented as plug-and-play infrastructure for “genetic”/agentic AI: a serverless runtime with built-in memory, code interpreter, and a browser tool, all inside the Amazon Bedrock stack. For deeper control, AWS adds fine-tuning for Amazon Nova models inside SageMaker, plus an AWS AI League challenge for fine-tuning lightweight models. The message is clear: agent builders get both tooling and a path to production.

On the video-generation front, open-source and commercial systems both take steps toward more controllable acting. Pusa 1.0 (open source) is introduced as a more efficient video model than WAN, using vectorized timestep adaptation and claiming far lower training cost (200x cheaper) and faster generation. It also expands the underlying WAN capabilities with text-to-video, start/end frames, and video extension. Runway ML’s Act Two is framed as a leap over Act One: instead of only face acting, it tracks full-body motion—hands, legs, torso—and produces more dynamic, scene-like outputs. The result is better for storytelling, even if artifacts still appear (like hand geometry issues).

Other creative tools broaden the pipeline from prompt to finished media. Open Art’s “Open Art Story” turns scripts, beats, or characters into one-minute videos with motion, music, and narrative arcs, positioning itself as a competitor to similar “prompt-to-story” offerings. LTX Video’s new open model targets native 60-second generation on consumer GPUs, with depth control and LoRAs, and emphasizes consistency—keeping characters and settings stable across longer clips.

Finally, smaller but notable product updates land across the stack: Record mode arrives for ChatGPT Plus users globally but only on the macOS desktop app; a potential new OpenAI image model is teased via a tweet from Andrew Maine; Sunno AI releases V4.5 Plus for vocal/instrumental swapping and playlist-based song creation; and Higsfield introduces a UGC builder that behaves like a “digital actor in a box,” letting users set emotional tone for avatar performance. Together, the throughline is practical: agents, longer video, and more controllable generation are becoming accessible—either via closed platforms or open weights that invite customization.

Cornell Notes

The central theme is that AI is shifting from generating text or short clips to performing tasks and producing longer, more controllable media. OpenAI’s new ChatGPT agent is positioned as a computer-using, code-writing system that can handle white-collar workflows, with rollout starting for Pro and expanding to Plus on Monday. Open-source releases like Moonshot AI’s Kimmy K2 aim to deliver strong coding and agentic performance with open weights and code, enabling fine-tuning for specific agent categories. On the video side, open models such as Pusa 1.0 and LTX Video’s 60-second system push longer generation on cheaper hardware, while Runway ML’s Act Two improves full-body acting tracking for more scene-like outputs. These changes matter because they lower barriers to building real products, not just demos.

What makes OpenAI’s ChatGPT agent different from a standard chatbot?

It’s described as an agent that can use a computer, write code, and create file systems—moving beyond text-only responses into actions that resemble workflow execution. Access is tied to subscriptions: ChatGPT Pro already has it, ChatGPT Plus begins receiving access Monday, and free users do not yet get it. Benchmark references are used to suggest it’s approaching human-level performance on white-collar tasks and ranking near top models on tough evaluations.

Why is Kimmy K2 framed as a breakthrough for open-source builders?

Moonshot AI’s Kimmy K2 is presented as a large open model with 1 trillion total parameters and a 32B active model, strong results on SWE-bench verified, and emphasis on coding plus agentic tasks. Crucially, weights and code are available on Hugging Face and GitHub, and the model is positioned as modifiable—potentially fine-tunable for targeted agent categories. A demo report generation is used to illustrate multi-step research behavior with citations and follow-ups.

How does AWS’s Amazon Bedrock Agent Core fit into the agent trend?

Amazon Bedrock Agent Core is pitched as plug-and-play infrastructure for agentic AI inside the Amazon Bedrock stack. It includes a serverless runtime plus built-in memory, a code interpreter, and a browser tool, reducing integration friction. AWS also adds fine-tuning for Amazon Nova models inside SageMaker and launches an AWS AI League challenge for fine-tuning lightweight models, tying agent development to deployment and learning resources.

What improvements does Runway ML’s Act Two claim over Act One?

Act One focused on face acting by mapping recorded facial performance onto an AI character. Act Two expands tracking to full-body motion—hands, legs, torso, and more—so generated clips look more like dynamic movie shots. The transcript notes remaining imperfections (for example, hand geometry issues), but emphasizes better suitability for storytelling due to body-level acting control.

What does the open-source video push aim to solve: quality, length, or cost?

It targets all three, with different models emphasizing different tradeoffs. Pusa 1.0 claims more efficient training (200x cheaper) and faster generation while adding capabilities like text-to-video and video extension. LTX Video’s new open model focuses on native 60-second clips that run on consumer GPUs and maintain consistency across the clip, while acknowledging it won’t match the highest-end state-of-the-art quality.

Review Questions

  1. Which capabilities listed for ChatGPT’s agent (e.g., code writing, file systems, computer use) most directly change what users can accomplish compared with a text-only model?
  2. How do open-weight releases like Kimmy K2 change what builders can do versus relying only on closed APIs?
  3. Across the video tools mentioned (Pusa 1.0, Act Two, LTX Video), what specific dimension—length, body control, or hardware cost—gets the most emphasis?

Key Points

  1. 1

    OpenAI’s ChatGPT agent is positioned as a computer-using, code-writing system that can build file systems, with access starting for Pro and expanding to Plus on Monday.

  2. 2

    Moonshot AI’s Kimmy K2 is marketed as a major open-source agentic/coding model with open weights and code, aiming to compete with closed models for builders.

  3. 3

    AWS is pushing agent infrastructure via Amazon Bedrock Agent Core, adding serverless agent runtime features like memory, code interpreter, and browser tools inside Amazon Bedrock.

  4. 4

    Runway ML’s Act Two upgrades acting generation from face-only tracking to full-body motion, improving scene dynamism for storytelling.

  5. 5

    Open-source video models are moving toward longer outputs and cheaper hardware: Pusa 1.0 emphasizes efficiency and added controls, while LTX Video targets native 60-second generation on consumer GPUs.

  6. 6

    Open Art’s Open Art Story focuses on prompt-to-one-minute narrative videos with motion, music, and a narrative arc, aiming to differentiate from similar “story synthesis” tools.

  7. 7

    A set of smaller product updates—ChatGPT macOS Record mode, a teased new OpenAI image model, Sunno AI’s V4.5 Plus audio features, and Higsfield’s emotion-tuned avatar builder—shows rapid iteration across the media stack.

Highlights

ChatGPT agent is framed as a step toward real task execution: using a computer, writing code, and creating file systems—then ranking it near top-tier benchmarks for white-collar work.
Kimmy K2’s open release (weights and code on Hugging Face and GitHub) is pitched as enabling fine-tuning for agentic categories, not just running a model behind an API.
Runway ML’s Act Two shifts from face acting to full-body tracking, making generated clips feel more like movie scenes even when imperfections remain.
LTX Video’s open model claims native 60-second generation on consumer GPUs with strong clip consistency—an unusual combination for open video systems.

Topics

Mentioned