Does DALL-E 3 Have Competition? Open Source GPT-4 Vision & more! | AI NEWS
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Adobe’s Firefly Image 2 update focuses on improved realism and creator control, with faster generation and the ability to toggle between Image 2 and Image 1.
Briefing
Adobe is rolling out a major upgrade to its Firefly image generator, positioning the new Firefly Image 2 model as a serious alternative for creators who care about realism, controllability, and production-ready output—especially inside Adobe’s own workflow. Firefly’s outputs are described as commercially usable, and prompts are claimed to work across 100+ languages. The update lands in the Firefly web app and also ties into tools like Photoshop (Generative Fill and Generative Expand), Illustrator (generative recoloring), and a standalone Firefly web experience meant to compete with mainstream text-to-image systems.
The headline change is Firefly Image 2, which Adobe says improves creator control and image quality, along with faster generation and easier use. Users can toggle between “image 2” and the older “image 1” model. Beyond the model swap, the web interface adds new controls aimed at making outputs look more like real photography: a “visual intensity” slider to shift toward more artistic or more photographic results; a style strength slider; and a photo-settings panel that exposes camera-like parameters such as aperture (down to f1.2 and up to f22), shutter speed (from 1/12000 to 10 seconds), field of view (roughly 14mm to 300mm+), plus negative prompting. The interface also supports “style” transfer by uploading a reference image and applying its look to a new generation.
Community testing and side-by-side comparisons suggest Firefly’s strengths cluster around fine-grained realism and detail—particularly in hair, skin texture, and lighting—while other models may win on raw prompt adherence or specific failure modes. In one set of comparisons, Dolly 3 is described as more literal and stronger at text generation (Firefly is portrayed as weak or absent for text), while Midjourney is credited with strong results but sometimes shows background or object oddities. Firefly is repeatedly praised for looking “photo-like,” including wrinkles and hair strands, though it’s also shown producing occasional malformed artifacts and “bug” images where coherence breaks.
A standout feature in the hands-on demos is style transfer: applying the style of an uploaded image (including the creator’s own channel logo) to the same underlying concept. The results range from convincing hybrids (a cat-dog blend) to intentionally disturbing outcomes when a “decrepit” face image is used—framed as “nightmare fuel.” Adobe also adds guidance from team members: using built-in lighting and intensity controls can steer results toward more authentic photo aesthetics, and text alignment improvements can prevent common issues like extra fingers.
Pricing is presented as credit-based: a free tier includes $25 monthly generative credits, while a premium tier offers 100 monthly credits for $5/month—though the channel host suggests heavy Adobe users may effectively get unlimited generations through existing subscriptions. The broader competitive landscape remains unsettled: Firefly is not portrayed as a Dolly 3 killer overall, but it’s framed as a strong option for product photography and realism-focused workflows.
The update comes alongside another major theme in AI image generation: safety restrictions. Dolly 3’s image generation inside Microsoft’s Image Creator is described as tightening, with inconsistent enforcement—sometimes blocking prompts that should be allowed, while other prompts slip through. The segment also touches open-source vision alternatives (LLaVA on Hugging Face), claims about GPT-4 Vision decoding redacted NASA text with near-100% accuracy in one test, and a broader AI trajectory discussion that includes talk of GPT-4.5 and timelines for artificial superintelligence.
Cornell Notes
Adobe’s Firefly Image 2 update aims to compete with leading text-to-image systems by improving realism and—more importantly—giving creators more control. The Firefly web app adds photo-style controls such as visual intensity, style strength, and camera-like settings (aperture, shutter speed, field of view), plus negative prompting. Community comparisons suggest Firefly often excels at photo-like detail (hair, wrinkles, lighting) and product-style realism, while Dolly 3 is credited with stronger text generation and sometimes better prompt adherence. A major differentiator is style transfer: uploading a reference image and applying its look to a new concept can produce consistent, sometimes surprising results. Pricing is credit-based on top of Adobe’s ecosystem, with free and low-cost tiers described alongside potential subscription benefits.
What’s the biggest practical change in Adobe Firefly Image 2 beyond “a new model”?
How do community comparisons position Firefly versus Dolly 3 and Midjourney?
Why does style transfer matter in the Firefly update?
What specific interface controls are highlighted as making outputs look more like real photos?
What does the transcript suggest about safety restrictions in Dolly 3’s ecosystem?
What open-source vision alternative is mentioned, and how is it evaluated?
Review Questions
- Which Firefly controls (sliders or camera parameters) are most directly tied to shaping photographic realism, and what does each one influence?
- In the comparisons, what are the recurring reasons Dolly 3 is favored over Firefly (and vice versa)?
- How does style transfer differ from ordinary text-to-image prompting, and what evidence from the demos supports that difference?
Key Points
- 1
Adobe’s Firefly Image 2 update focuses on improved realism and creator control, with faster generation and the ability to toggle between Image 2 and Image 1.
- 2
The Firefly web app adds photo-oriented controls: visual intensity, style strength, camera-like settings (aperture, shutter speed, field of view), and negative prompting.
- 3
Community comparisons credit Firefly with strong photo-like detail (hair, skin, wrinkles) and prompt fit, while Dolly 3 is credited with better text generation and sometimes stronger prompt adherence.
- 4
A major differentiator is style transfer via image upload, enabling consistent style application to new concepts rather than only one-off outputs.
- 5
Firefly is presented as commercially usable and multilingual in prompt handling (100+ languages), with pricing based on monthly generative credits.
- 6
Safety restrictions around Dolly 3’s image generation are described as tightening and inconsistently enforced, affecting even some simpler prompts.
- 7
Open-source vision alternatives like LLaVA are highlighted as usable in-browser options, though not at GPT-4V quality.