Get AI summaries of any video or article — Sign up free
ChatGPT Plugins go PUBLIC, DALL-E Upgrade, Google PaLM 2! | AI News thumbnail

ChatGPT Plugins go PUBLIC, DALL-E Upgrade, Google PaLM 2! | AI News

MattVidPro·
6 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Stability AI’s Stable Animation ships as an SDK for developers, enabling text-to-animation plus optional initial image or video inputs with parameter controls.

Briefing

AI’s biggest near-term shift is the race to turn models into everyday tools—inside email, maps, search, productivity suites, and chat—while image and video generation gets more “developer-ready” and more prompt-faithful. Stability AI is pushing that direction with Stable Animation, an SDK built on Stable Diffusion that lets developers generate animations from text prompts, tweak parameters (like typical diffusion controls), and even start from initial images or videos plus text. The outputs can run for longer sequences than earlier tools, though motion can remain less coherent over time. The practical takeaway: animation is moving from novelty demos toward integration-ready software components.

Google’s I/O announcements put the same theme on a much larger scale: Palm 2 is being embedded across products and paired with new editing and workflow features. In Gmail, “Help me write” uses Palm 2 to draft replies using the context of the current email thread—illustrated with a refund request scenario. Google Maps adds AI-driven routing that can optimize for specific interests (like landmark-heavy routes in New York City), plus a “bird’s-eye view” experience and improved real-time traffic and weather detection. A “Magic Editor” brings photo manipulation closer to consumer-friendly Photoshop-like edits, including background changes and object repositioning (such as moving a bench and swapping in new elements).

Under the hood, Palm 2 is positioned as a family of models with multiple sizes, including a smallest “Gekko” variant designed to run offline on a smartphone. Google claims stronger logic and reasoning than the original Palm, multilingual training across 100+ languages, and improved coding versus earlier versions. The most consequential capability is domain fine-tuning: organizations can train Palm 2 on their own specialized data, which Google frames as a path to high-performance medical or other industry-specific assistants. A medical fine-tuned Palm 2 variant is also aimed at multimodal tasks like interpreting X-rays and mammograms.

Google also expands AI into its broader workspace: Drive, Docs, Sheets, Slides, and more get AI assistance via Duet AI-style features that support writing, prompting, and contextual suggestions. For Bard, Google is rolling out “extensions” that function like ChatGPT plugins—opening Bard to third-party integrations. The rollout is broad (available across 180+ countries), and early examples include Adobe Firefly for image generation directly inside Bard, alongside other partners such as Wolfram Alpha, Spotify, and retail or utility services. Google Search is also getting AI-style follow-up questioning and web-aware answers, aligning with what competitors already offer.

OpenAI’s counter-move is to accelerate access to web browsing and plugins for ChatGPT Plus users, moving from alpha to beta over the next week. That includes internet access via a browsing capability and a new Dolly 2 plugin for image generation. The Dolly 2 update is framed as a leap in coherence—especially for text inside images—plus improved prompt following compared with earlier Dolly versions and with Bing’s “supercharged” Dolly 2. In side-by-side comparisons, the new Dolly 2 is shown producing more complete, element-accurate scenes (including legible text), while Midjourney is portrayed as more artistically strong but less reliable at hitting every prompt detail.

Meanwhile, IBM’s Dromedary instruction-tuned open model and OpenAI’s internal use of GPT-4 to better understand model internals add another layer: the race isn’t only about raw output quality, but also about controllability, safety tuning, and how quickly capabilities can be operationalized. The overall message is clear—Google and OpenAI are compressing timelines, pushing models into daily workflows, and raising the bar for prompt-faithful multimodal generation.

Cornell Notes

The AI competition is shifting from standalone model demos to integrated products: Stability AI’s Stable Animation SDK, Google’s Palm 2 embedded across Gmail, Maps, and Workspace, and OpenAI’s rollout of web browsing plus plugins for ChatGPT Plus. Google’s Palm 2 is offered in multiple sizes (including an offline smartphone model) and is positioned for domain fine-tuning, enabling specialized assistants such as medical multimodal systems for interpreting X-rays and mammograms. Bard’s new “extensions” mirror the plugin concept and bring third-party tools like Adobe Firefly directly into chat. OpenAI’s Dolly 2 plugin emphasizes higher coherence and stronger prompt-following, especially for generating readable text within images. Together, these moves show both companies compressing the gap by turning AI into tools people use immediately.

What does Stability AI’s Stable Animation SDK change for developers and creators?

Stable Animation is delivered as an SDK built on Stable Diffusion, making animation generation more “integration-ready.” It supports generating animations from a simple text prompt, adjusting diffusion-style parameters, and using initial image inputs. It also allows initial video plus text as a starting point, enabling creators to steer motion and style from existing media. The transcript notes the look is familiar to Stable Diffusion animation workflows (morphing/rapid visual changes), but the SDK’s official support and developer setup are positioned as a practical upgrade.

How is Google using Palm 2 to embed AI into everyday Google products?

Google’s I/O announcements include Palm 2 directly inside Gmail via “Help me write,” which drafts emails using the context of the current thread. Google Maps gets AI routing that can optimize for specific interests (e.g., landmark-focused routes) rather than only fastest paths, plus “bird’s-eye view” guidance and improved real-time traffic and weather detection. Google Workspace also receives AI assistance through Duet AI-style capabilities across Docs, Sheets, and Slides, offering writing help and contextual suggestions.

Why is domain fine-tuning a major differentiator in Google’s Palm 2 pitch?

The transcript highlights domain-specific fine-tuning as “massive news”: organizations can train Palm 2 on their own specialized data so the model becomes more proficient in that domain (medical knowledge is the example). It contrasts this with ChatGPT, which the transcript claims lacks built-in capability for this kind of domain fine-tuning. Google also points to a medical fine-tuned Palm 2 variant aiming to become multimodal for tasks like interpreting X-rays and mammograms.

What does “extensions” for Bard mean in practice, and how does it compete with ChatGPT plugins?

Bard “extensions” are described as essentially the same concept as ChatGPT plugins—third-party integrations that run inside the chat experience. The transcript emphasizes the competitive angle: Bard is opening broadly (180+ countries) and adding partners such as Adobe Firefly for image generation, plus other tools like Wolfram Alpha, Spotify, and retail services. The goal is to match the plugin ecosystem and make Bard a platform for external capabilities.

What’s the key improvement claimed for OpenAI’s Dolly 2 plugin?

The transcript frames Dolly 2 as more coherent and better at prompt following, with special emphasis on generating text inside images. Side-by-side comparisons are used to argue that the new Dolly 2 handles complex, multi-element prompts more reliably than earlier Dolly versions and than Bing’s “supercharged” Dolly 2, while Midjourney may miss required elements even when the artwork looks strong. The Dolly 2 plugin is also presented as enabling aspect ratio control and higher detail.

How does OpenAI’s web browsing rollout change what ChatGPT can do?

OpenAI is rolling out web browsing and plugins to ChatGPT Plus users over the next week, moving from alpha to beta. The transcript claims this lets ChatGPT click on specific websites and retrieve information more directly than search-result-only approaches. It also positions the plugin ecosystem as a way to extend capabilities beyond what the base model can do alone, including a Dolly 2 image-generation plugin.

Review Questions

  1. Which Google product integrations mentioned in the transcript rely on Palm 2, and what specific user task does each one target?
  2. What does domain fine-tuning enable for Palm 2, and why is that framed as a competitive advantage?
  3. In the Dolly 2 comparisons, what kinds of prompt elements are used to judge coherence and prompt-following quality?

Key Points

  1. 1

    Stability AI’s Stable Animation ships as an SDK for developers, enabling text-to-animation plus optional initial image or video inputs with parameter controls.

  2. 2

    Google’s Palm 2 is being embedded into Gmail, Google Maps, and Google Workspace to draft emails, generate interest-based routes, and assist with writing and content creation.

  3. 3

    Palm 2’s model family includes a smallest variant (Gekko) designed for offline smartphone use, while larger models target stronger reasoning and multilingual coverage.

  4. 4

    Google’s strategy emphasizes domain fine-tuning so organizations can train Palm 2 on their own data, with medical multimodal interpretation (X-rays, mammograms) highlighted as a goal.

  5. 5

    Bard’s new “extensions” function like ChatGPT plugins, bringing third-party tools such as Adobe Firefly into the chat experience, with a broad rollout across 180+ countries.

  6. 6

    OpenAI is expanding ChatGPT Plus with web browsing and plugins, aiming to let ChatGPT access specific websites rather than relying only on summarized search results.

  7. 7

    OpenAI’s Dolly 2 plugin is presented as a coherence and prompt-following upgrade, especially for generating readable text inside images.

Highlights

Stable Animation turns Stable Diffusion-style generation into an SDK workflow for animations, including text prompts and initial video or image conditioning.
Palm 2 is positioned as both a family of models (including offline smartphone use) and a platform for domain fine-tuning, with medical multimodal interpretation as a headline use case.
Bard’s extensions mirror the plugin ecosystem, with Adobe Firefly integration used as a concrete example of image generation inside chat.
OpenAI’s Dolly 2 plugin is judged largely on prompt faithfulness and text coherence, with comparisons suggesting it hits more required elements than competing generators.

Topics

  • Stable Animation SDK
  • Palm 2 Integration
  • Bard Extensions
  • Dolly 2 Plugin
  • AI Search and Browsing