LATEST AI Advances: Dreambooth, Midjourney V4, Photorealistic Text to Image Model & Google Imagen
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Midjourney V4 is credited with stronger prompt-following and scene coherence than DALL·E 2, though it still struggles with some complex spatial details.
Briefing
Midjourney V4 is being treated as the new benchmark for prompt-following and overall image coherence, with users comparing its results favorably against DALL·E 2—especially when the prompt demands a mashup of distinct concepts. Examples discussed include a “Darth Vader toilet,” where the model appears to understand both the character and the object well enough to produce an image that looks like it belongs in a coherent Star Wars-themed setting. Even when it stumbles—such as oddities around how objects should be positioned—the general direction is clear: Midjourney V4 is landing more consistently on the intended idea, and it’s also described as more affordable than DALL·E 2.
That momentum is now spilling into open tooling. Google’s DreamBooth—described as open-source software that can be applied to text-to-image systems like Stable Diffusion—is being used to “capture the essence” of Midjourney V4’s look and apply it to Stable Diffusion. A free Google Colab workflow is highlighted as a way to generate sample images, with the discussion emphasizing how Stable Diffusion can inherit a similar style and improve perceived quality by training on Midjourney V4 outputs. The ethics question comes up immediately: training a model on another company’s images raises concerns, but the conversation leans toward the idea that model improvement often relies on learning from existing work, making the practice feel less controversial than it might at first glance.
Another thread focuses on independent model-building and prompt search ecosystems. Lexica—an interface for browsing and searching millions of Stable Diffusion prompts and images—has users and developers experimenting with their own generation models. A creator associated with Lexica Art is said to be training a model that produces strikingly coherent, high-resolution-looking results, with special praise for faces and skin texture—areas that typically break down in text-to-image systems. The model isn’t presented as perfect (some prompt elements come out strange), but the early outputs are framed as unusually strong compared with many other “out of the gate” releases.
DreamBooth is also being folded into consumer-facing apps, including an iPhone app that offers DreamBooth-style personalization for free, though with limitations: users can generate images of selected famous figures rather than creating fully custom identities. The examples shown range from plausible results to clearly uncanny failures, underscoring both the speed of iteration and the unevenness of quality.
Pricing and platform shifts round out the roundup. Playground AI changes how it charges for DALL·E 2 access—turning it into a paid add-on rather than bundling it under a higher tier—while Crayon (an image model previously popular on the platform) receives an update enabling higher-resolution 1024×1024 outputs. Runway ML is highlighted for video-centric tools like infinite image (outpainting-like expansion), image-to-image transformations, and inpainting, with a caveat that free usage is limited by project caps and output resolution.
Finally, OpenAI’s DALL·E 2 API is described as now available, enabling other companies to integrate DALL·E 2 into their own products for a fee. The segment closes with Google’s AI Test Kitchen—an iOS/Android app in a waitlist phase—positioned as a testing ground for Google’s upcoming image generation capabilities, with Google’s Lambda mentioned as the current text model available there.
Cornell Notes
Midjourney V4 is portrayed as a step up in prompt adherence and image coherence, sometimes outperforming DALL·E 2 in both “understanding” and artistic consistency. That improvement is being replicated through DreamBooth workflows: Google’s DreamBooth is used to train Stable Diffusion so it can mimic Midjourney V4’s style, with a free Colab option offered for experimentation. Lexica’s ecosystem is also driving independent model development, where a Lexica Art creator’s model is praised for unusually strong face and skin texture fidelity. Meanwhile, platforms like Playground AI and Runway ML are reshaping pricing and adding features such as higher-resolution generation and infinite image/outpainting-style editing. OpenAI’s DALL·E 2 API and Google’s AI Test Kitchen round out the shift toward easier integration and broader access.
Why is Midjourney V4 getting framed as a coherence and prompt-following winner?
How does DreamBooth connect Midjourney V4 style to Stable Diffusion outputs?
What makes the Lexica-linked model outputs stand out in the conversation?
What ethical concern is raised about training on Midjourney images, and how is it addressed?
How are Playground AI and Runway ML changing the user experience?
What new access points are mentioned for DALL·E 2 and Google’s image generation?
Review Questions
- Which specific capabilities (prompt adherence, coherence, face texture) are used as evidence for Midjourney V4’s perceived lead, and where does it still fail?
- How does DreamBooth training on Midjourney V4-style images change Stable Diffusion outputs, and what practical setup is offered to try it?
- Compare the strengths and limitations attributed to the Lexica-linked model versus Runway ML’s infinite image/outpainting approach.
Key Points
- 1
Midjourney V4 is credited with stronger prompt-following and scene coherence than DALL·E 2, though it still struggles with some complex spatial details.
- 2
DreamBooth workflows are being used to train Stable Diffusion to mimic Midjourney V4’s style, with a free Google Colab option presented for experimentation.
- 3
Lexica’s prompt/image ecosystem is linked to independent model training, where unusually strong face and skin texture fidelity is highlighted.
- 4
Consumer apps are starting to offer DreamBooth-style generation for free, but often with constraints like generating from a fixed set of famous identities.
- 5
Playground AI is changing DALL·E 2 access pricing (DALL·E 2 as a $10 add-on) while adding higher-resolution 1024×1024 generation for Crayon.
- 6
Runway ML is expanding video editing features such as infinite image (outpainting-like expansion), image-to-image, and inpainting, with free tiers limited by project caps and resolution.
- 7
OpenAI’s DALL·E 2 API and Google’s AI Test Kitchen (with Lambda now and Imagen expected) signal broader integration and access across platforms.