Get AI summaries of any video or article — Sign up free
Does DALL-E 3 Have Competition? Open Source GPT-4 Vision & more! | AI NEWS thumbnail

Does DALL-E 3 Have Competition? Open Source GPT-4 Vision & more! | AI NEWS

MattVidPro·
6 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Adobe’s Firefly Image 2 update focuses on improved realism and creator control, with faster generation and the ability to toggle between Image 2 and Image 1.

Briefing

Adobe is rolling out a major upgrade to its Firefly image generator, positioning the new Firefly Image 2 model as a serious alternative for creators who care about realism, controllability, and production-ready output—especially inside Adobe’s own workflow. Firefly’s outputs are described as commercially usable, and prompts are claimed to work across 100+ languages. The update lands in the Firefly web app and also ties into tools like Photoshop (Generative Fill and Generative Expand), Illustrator (generative recoloring), and a standalone Firefly web experience meant to compete with mainstream text-to-image systems.

The headline change is Firefly Image 2, which Adobe says improves creator control and image quality, along with faster generation and easier use. Users can toggle between “image 2” and the older “image 1” model. Beyond the model swap, the web interface adds new controls aimed at making outputs look more like real photography: a “visual intensity” slider to shift toward more artistic or more photographic results; a style strength slider; and a photo-settings panel that exposes camera-like parameters such as aperture (down to f1.2 and up to f22), shutter speed (from 1/12000 to 10 seconds), field of view (roughly 14mm to 300mm+), plus negative prompting. The interface also supports “style” transfer by uploading a reference image and applying its look to a new generation.

Community testing and side-by-side comparisons suggest Firefly’s strengths cluster around fine-grained realism and detail—particularly in hair, skin texture, and lighting—while other models may win on raw prompt adherence or specific failure modes. In one set of comparisons, Dolly 3 is described as more literal and stronger at text generation (Firefly is portrayed as weak or absent for text), while Midjourney is credited with strong results but sometimes shows background or object oddities. Firefly is repeatedly praised for looking “photo-like,” including wrinkles and hair strands, though it’s also shown producing occasional malformed artifacts and “bug” images where coherence breaks.

A standout feature in the hands-on demos is style transfer: applying the style of an uploaded image (including the creator’s own channel logo) to the same underlying concept. The results range from convincing hybrids (a cat-dog blend) to intentionally disturbing outcomes when a “decrepit” face image is used—framed as “nightmare fuel.” Adobe also adds guidance from team members: using built-in lighting and intensity controls can steer results toward more authentic photo aesthetics, and text alignment improvements can prevent common issues like extra fingers.

Pricing is presented as credit-based: a free tier includes $25 monthly generative credits, while a premium tier offers 100 monthly credits for $5/month—though the channel host suggests heavy Adobe users may effectively get unlimited generations through existing subscriptions. The broader competitive landscape remains unsettled: Firefly is not portrayed as a Dolly 3 killer overall, but it’s framed as a strong option for product photography and realism-focused workflows.

The update comes alongside another major theme in AI image generation: safety restrictions. Dolly 3’s image generation inside Microsoft’s Image Creator is described as tightening, with inconsistent enforcement—sometimes blocking prompts that should be allowed, while other prompts slip through. The segment also touches open-source vision alternatives (LLaVA on Hugging Face), claims about GPT-4 Vision decoding redacted NASA text with near-100% accuracy in one test, and a broader AI trajectory discussion that includes talk of GPT-4.5 and timelines for artificial superintelligence.

Cornell Notes

Adobe’s Firefly Image 2 update aims to compete with leading text-to-image systems by improving realism and—more importantly—giving creators more control. The Firefly web app adds photo-style controls such as visual intensity, style strength, and camera-like settings (aperture, shutter speed, field of view), plus negative prompting. Community comparisons suggest Firefly often excels at photo-like detail (hair, wrinkles, lighting) and product-style realism, while Dolly 3 is credited with stronger text generation and sometimes better prompt adherence. A major differentiator is style transfer: uploading a reference image and applying its look to a new concept can produce consistent, sometimes surprising results. Pricing is credit-based on top of Adobe’s ecosystem, with free and low-cost tiers described alongside potential subscription benefits.

What’s the biggest practical change in Adobe Firefly Image 2 beyond “a new model”?

The Firefly web interface adds controls designed for photographic outcomes. Users get a “visual intensity” slider to move between more artistic and more photographic results, a style strength slider, and a photo-settings panel with camera-like parameters: aperture (f1.2 to f22), shutter speed (1/12000 to 10 seconds), and field of view (about 14mm to 300mm+). It also includes negative prompting and “auto” photo settings, giving creators more levers than a simple prompt box.

How do community comparisons position Firefly versus Dolly 3 and Midjourney?

In side-by-side examples, Dolly 3 is described as more literal and stronger at text generation, while Midjourney is praised for detail but sometimes produces background or object oddities. Firefly is repeatedly favored for realism cues—especially hair strands, skin detail, and wrinkles—and for fitting prompts in a “photo-like” way. The tradeoff is that Firefly can still produce coherence failures or malformed artifacts in some generations.

Why does style transfer matter in the Firefly update?

Style transfer lets users upload a reference image and apply its style to a new generation while keeping the new concept. In the demos, applying a channel logo’s style to a lemon-themed cityscape is framed as a “killer feature,” because it can preserve recognizable style nuances rather than just producing a one-off image. The same mechanism can also create hybrids (e.g., cat-dog) or intentionally disturbing results when the reference image is extreme.

What specific interface controls are highlighted as making outputs look more like real photos?

Team guidance and demos emphasize the visual intensity slider and built-in lighting controls. Keeping intensity around the middle is described as producing a studio-portrait look, while adding terms like “hyper realistic” can push toward social-media photo aesthetics. The key point is that the controls can steer results similarly whether used via prompts or via the interface settings.

What does the transcript suggest about safety restrictions in Dolly 3’s ecosystem?

Safety filters are described as tightening and becoming less predictable. The host reports that some prompts can be blocked as “unsafe image content” even when similar prompts sometimes pass, suggesting inconsistent enforcement. The segment also gives an example of a prompt being flagged after adding explosives to a copyrighted character reference, and it notes creators seeing blocks on simpler prompts too.

What open-source vision alternative is mentioned, and how is it evaluated?

LLaVA is mentioned as an open-source vision chatbot hosted on Hugging Face, usable in a browser for free. It’s described as not matching GPT-4V quality, but “serviceable and usable,” with an example where the model accurately describes an image involving a man kicking an alligator in water. The transcript also notes that Georgie Gerganov reportedly ran it on an M2 Max MacBook.

Review Questions

  1. Which Firefly controls (sliders or camera parameters) are most directly tied to shaping photographic realism, and what does each one influence?
  2. In the comparisons, what are the recurring reasons Dolly 3 is favored over Firefly (and vice versa)?
  3. How does style transfer differ from ordinary text-to-image prompting, and what evidence from the demos supports that difference?

Key Points

  1. 1

    Adobe’s Firefly Image 2 update focuses on improved realism and creator control, with faster generation and the ability to toggle between Image 2 and Image 1.

  2. 2

    The Firefly web app adds photo-oriented controls: visual intensity, style strength, camera-like settings (aperture, shutter speed, field of view), and negative prompting.

  3. 3

    Community comparisons credit Firefly with strong photo-like detail (hair, skin, wrinkles) and prompt fit, while Dolly 3 is credited with better text generation and sometimes stronger prompt adherence.

  4. 4

    A major differentiator is style transfer via image upload, enabling consistent style application to new concepts rather than only one-off outputs.

  5. 5

    Firefly is presented as commercially usable and multilingual in prompt handling (100+ languages), with pricing based on monthly generative credits.

  6. 6

    Safety restrictions around Dolly 3’s image generation are described as tightening and inconsistently enforced, affecting even some simpler prompts.

  7. 7

    Open-source vision alternatives like LLaVA are highlighted as usable in-browser options, though not at GPT-4V quality.

Highlights

Firefly Image 2 isn’t just a model upgrade—its web interface adds camera-style controls (aperture, shutter speed, field of view) and negative prompting to steer realism.
Style transfer via uploaded images is framed as the update’s “killer feature,” producing consistent style application across new concepts.
Dolly 3’s strengths in the comparisons include text generation, while Firefly’s strengths skew toward photo-like detail and controllability.
Safety filters for Dolly 3 are portrayed as increasingly restrictive and uneven, with prompts sometimes blocked unpredictably.

Topics

Mentioned