AI is BOOMING! Google CRUSHES it, Open AI Overhauls Chat Memory, Open Source models & MORE!
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
ChatGPT’s extended memory can reference past chats to tailor responses around preferences and conversational style, with an option to disable it.
Briefing
AI’s momentum is accelerating across text, image, video, audio, and infrastructure—highlighted by OpenAI’s new ChatGPT “extended memory” feature that can reference past conversations to deliver more personalized responses, plus Google’s push to make multimodal tools and faster models broadly usable via APIs.
The most consequential product change is ChatGPT’s extended memory. Instead of relying only on saved memories, the system can draw on a user’s prior chats to shape answers around preferences, interests, and even conversational style. The result is described as smoother, more tailored interactions that can feel “spooky” in how accurately the assistant reflects personality and communication patterns. There’s also an acknowledgement of occasional hallucinations when users ask very specific questions, with quick self-correction after being challenged. Users can opt out or disable the feature, but the core shift is clear: personalization is moving from explicit memory entries to broader conversational recall.
Google’s week leans heavily into practical deployment. Firebase Studio, positioned as an AI “vibe coding” platform, uses Gemini under the hood but not the strongest Gemini coding model (not Gemini 2.5 Pro). Early feedback is mixed: some users report weak results generating apps, while others say it can work better with tweaks—though environment issues at launch remain a concern. Google also expanded its image generation lineup and capabilities: Gemini 2.5 Flash is live, and Gemini V2 is now available publicly through Gemini’s interface and API, including inpainting and outpainting. The API also supports more cinematic-style controls such as camera presets (like panning) and first/last frame features—aimed at teams producing longer-form or commercial content.
On the model-performance front, Google introduced Gen 4 Turbo, marketed as five times faster and half the cost of the original Gen 4, trading off some prompt coherence and quality for speed. In video generation, a separate research thread points to “one minute video generation with test time training,” producing coherent, interactive sequences inspired by classic Tom and Jerry dynamics—suggesting longer, story-consistent outputs may be within reach. Another video tool, Higsfield AI, is emphasizing camera-work control, adding multiple motion controls in a single shot (including moves not possible with real cameras) and releasing new motion controls focused on speed, tension, and cinematic impact. LTX Studio added actor consistency by letting users train custom characters from reference images, aiming to keep faces, outputs, and styles aligned across shots.
Audio and agent tooling also advanced. 11 Labs added an MCP server to let Claude and Cursor access its audio platform through text prompts, enabling use cases like voice agents for outbound calls. It also upgraded professional voice cloning to produce higher-quality voiceovers that sound more like the user.
Finally, the infrastructure layer matters: Google unveiled “Ironwood,” a new TPU for AI inference described as its sixth-generation chip, built to compete with Nvidia GPUs on cost and data-access speed. Meanwhile, Groq’s Gro 3 API finally launched with tiered pricing that includes a highly competitive Gro 3 Mini option and independent evaluations suggesting strong performance against several major models—while still trailing Gemini 2.5 Pro.
Taken together, the week’s through-line is deployment: memory that personalizes, APIs that operationalize multimodal generation, and hardware designed to make inference cheaper and faster—so AI capabilities can scale beyond demos into real workflows.
Cornell Notes
ChatGPT’s new extended memory feature lets the assistant reference a user’s past chats (not just saved memories) to produce more personalized, context-aware responses. Google’s releases focus on making AI coding and multimodal generation more usable through platforms like Firebase Studio and API-accessible image/video tools, including inpainting/outpainting and camera-style controls. Video and character generation are improving on coherence and consistency, with research on one-minute story-like generation and tools adding actor consistency from reference images. Audio tooling advanced via 11 Labs’ MCP server for agent-style access and improved professional voice cloning. Underpinning it all, inference hardware like Google’s Ironwood TPU targets cheaper, faster deployment.
What’s the practical difference between ChatGPT “saved memories” and the new “extended memory” feature?
Why does Firebase Studio’s performance look uneven in early feedback?
What capabilities did Google add via Gemini V2’s public API access?
How do the video-generation improvements differ between research and tools in this roundup?
What does 11 Labs’ MCP server enable for developers and AI agents?
Why is Google’s Ironwood TPU relevant even when model quality is the headline?
Review Questions
- Which specific ChatGPT capability change is most likely to affect day-to-day personalization: saved memories, extended memory, or both? Explain how extended memory works differently.
- What trade-offs are implied by Gen 4 Turbo’s “five times faster and half the cost” positioning?
- How do Higsfield AI and LTX Studio each target a different pain point in video generation (camera control vs actor consistency)?
Key Points
- 1
ChatGPT’s extended memory can reference past chats to tailor responses around preferences and conversational style, with an option to disable it.
- 2
Firebase Studio uses Gemini but not Gemini 2.5 Pro, and early user reports suggest uneven results plus launch-time environment issues.
- 3
Gemini V2 is now available publicly through Gemini and via API, adding inpainting/outpainting and structured camera controls like panning and first/last frames.
- 4
Gen 4 Turbo prioritizes speed and cost, accepting reduced prompt coherence and quality compared with the full Gen 4 model.
- 5
Video progress is splitting across research (longer coherent story-like generation) and tools (camera-work control and actor consistency from reference images).
- 6
11 Labs expanded agent integration with an MCP server for Claude and Cursor, and improved professional voice cloning quality.
- 7
Google’s Ironwood TPU targets cheaper, faster inference as an alternative approach to relying solely on Nvidia GPUs.