SO MUCH AI NEWS! 60s AI Video, Full body AI Acting, & Open Source Slam Dunks!
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
OpenAI’s ChatGPT agent is positioned as a computer-using, code-writing system that can build file systems, with access starting for Pro and expanding to Plus on Monday.
Briefing
AI agents are moving from “chat” to “do,” with OpenAI’s new ChatGPT agent positioning itself as a near-human performer on white-collar tasks—using a computer, writing code, and building file systems. Access is rolling out first for ChatGPT Pro users, while ChatGPT Plus users begin getting it Monday; free users are still left out. The pitch is that this is not just another language model, but an agentic system that can operate in real workflows, and early benchmark references place it near top-tier closed models on demanding evaluations.
Open-source momentum is matching that shift, and several releases aim to make agentic capability cheaper and more customizable. Moonshot AI’s Kimmy K2 is billed as a major open model: 1 trillion total parameters with a 32B active model, strong performance on SWE-bench verified, and particular emphasis on coding and agent-like tasks. The weights and code are available on Hugging Face and GitHub, and the ecosystem is framed as “open and accessible” enough to compete with closed models for builders—especially because it can be modified and fine-tuned for targeted agent categories. In parallel, an open research-style demo is used to generate a long, citation-heavy report (a “top 50 snack foods” list), reinforcing the claim that the model can handle multi-step research with follow-up questioning.
AWS is also pushing agent infrastructure into mainstream deployment. Amazon Bedrock Agent Core is presented as plug-and-play infrastructure for “genetic”/agentic AI: a serverless runtime with built-in memory, code interpreter, and a browser tool, all inside the Amazon Bedrock stack. For deeper control, AWS adds fine-tuning for Amazon Nova models inside SageMaker, plus an AWS AI League challenge for fine-tuning lightweight models. The message is clear: agent builders get both tooling and a path to production.
On the video-generation front, open-source and commercial systems both take steps toward more controllable acting. Pusa 1.0 (open source) is introduced as a more efficient video model than WAN, using vectorized timestep adaptation and claiming far lower training cost (200x cheaper) and faster generation. It also expands the underlying WAN capabilities with text-to-video, start/end frames, and video extension. Runway ML’s Act Two is framed as a leap over Act One: instead of only face acting, it tracks full-body motion—hands, legs, torso—and produces more dynamic, scene-like outputs. The result is better for storytelling, even if artifacts still appear (like hand geometry issues).
Other creative tools broaden the pipeline from prompt to finished media. Open Art’s “Open Art Story” turns scripts, beats, or characters into one-minute videos with motion, music, and narrative arcs, positioning itself as a competitor to similar “prompt-to-story” offerings. LTX Video’s new open model targets native 60-second generation on consumer GPUs, with depth control and LoRAs, and emphasizes consistency—keeping characters and settings stable across longer clips.
Finally, smaller but notable product updates land across the stack: Record mode arrives for ChatGPT Plus users globally but only on the macOS desktop app; a potential new OpenAI image model is teased via a tweet from Andrew Maine; Sunno AI releases V4.5 Plus for vocal/instrumental swapping and playlist-based song creation; and Higsfield introduces a UGC builder that behaves like a “digital actor in a box,” letting users set emotional tone for avatar performance. Together, the throughline is practical: agents, longer video, and more controllable generation are becoming accessible—either via closed platforms or open weights that invite customization.
Cornell Notes
The central theme is that AI is shifting from generating text or short clips to performing tasks and producing longer, more controllable media. OpenAI’s new ChatGPT agent is positioned as a computer-using, code-writing system that can handle white-collar workflows, with rollout starting for Pro and expanding to Plus on Monday. Open-source releases like Moonshot AI’s Kimmy K2 aim to deliver strong coding and agentic performance with open weights and code, enabling fine-tuning for specific agent categories. On the video side, open models such as Pusa 1.0 and LTX Video’s 60-second system push longer generation on cheaper hardware, while Runway ML’s Act Two improves full-body acting tracking for more scene-like outputs. These changes matter because they lower barriers to building real products, not just demos.
What makes OpenAI’s ChatGPT agent different from a standard chatbot?
Why is Kimmy K2 framed as a breakthrough for open-source builders?
How does AWS’s Amazon Bedrock Agent Core fit into the agent trend?
What improvements does Runway ML’s Act Two claim over Act One?
What does the open-source video push aim to solve: quality, length, or cost?
Review Questions
- Which capabilities listed for ChatGPT’s agent (e.g., code writing, file systems, computer use) most directly change what users can accomplish compared with a text-only model?
- How do open-weight releases like Kimmy K2 change what builders can do versus relying only on closed APIs?
- Across the video tools mentioned (Pusa 1.0, Act Two, LTX Video), what specific dimension—length, body control, or hardware cost—gets the most emphasis?
Key Points
- 1
OpenAI’s ChatGPT agent is positioned as a computer-using, code-writing system that can build file systems, with access starting for Pro and expanding to Plus on Monday.
- 2
Moonshot AI’s Kimmy K2 is marketed as a major open-source agentic/coding model with open weights and code, aiming to compete with closed models for builders.
- 3
AWS is pushing agent infrastructure via Amazon Bedrock Agent Core, adding serverless agent runtime features like memory, code interpreter, and browser tools inside Amazon Bedrock.
- 4
Runway ML’s Act Two upgrades acting generation from face-only tracking to full-body motion, improving scene dynamism for storytelling.
- 5
Open-source video models are moving toward longer outputs and cheaper hardware: Pusa 1.0 emphasizes efficiency and added controls, while LTX Video targets native 60-second generation on consumer GPUs.
- 6
Open Art’s Open Art Story focuses on prompt-to-one-minute narrative videos with motion, music, and a narrative arc, aiming to differentiate from similar “story synthesis” tools.
- 7
A set of smaller product updates—ChatGPT macOS Record mode, a teased new OpenAI image model, Sunno AI’s V4.5 Plus audio features, and Higsfield’s emotion-tuned avatar builder—shows rapid iteration across the media stack.