Get AI summaries of any video or article — Sign up free
LATEST AI Advances: Dreambooth, Midjourney V4, Photorealistic Text to Image Model & Google Imagen thumbnail

LATEST AI Advances: Dreambooth, Midjourney V4, Photorealistic Text to Image Model & Google Imagen

MattVidPro·
5 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Midjourney V4 is credited with stronger prompt-following and scene coherence than DALL·E 2, though it still struggles with some complex spatial details.

Briefing

Midjourney V4 is being treated as the new benchmark for prompt-following and overall image coherence, with users comparing its results favorably against DALL·E 2—especially when the prompt demands a mashup of distinct concepts. Examples discussed include a “Darth Vader toilet,” where the model appears to understand both the character and the object well enough to produce an image that looks like it belongs in a coherent Star Wars-themed setting. Even when it stumbles—such as oddities around how objects should be positioned—the general direction is clear: Midjourney V4 is landing more consistently on the intended idea, and it’s also described as more affordable than DALL·E 2.

That momentum is now spilling into open tooling. Google’s DreamBooth—described as open-source software that can be applied to text-to-image systems like Stable Diffusion—is being used to “capture the essence” of Midjourney V4’s look and apply it to Stable Diffusion. A free Google Colab workflow is highlighted as a way to generate sample images, with the discussion emphasizing how Stable Diffusion can inherit a similar style and improve perceived quality by training on Midjourney V4 outputs. The ethics question comes up immediately: training a model on another company’s images raises concerns, but the conversation leans toward the idea that model improvement often relies on learning from existing work, making the practice feel less controversial than it might at first glance.

Another thread focuses on independent model-building and prompt search ecosystems. Lexica—an interface for browsing and searching millions of Stable Diffusion prompts and images—has users and developers experimenting with their own generation models. A creator associated with Lexica Art is said to be training a model that produces strikingly coherent, high-resolution-looking results, with special praise for faces and skin texture—areas that typically break down in text-to-image systems. The model isn’t presented as perfect (some prompt elements come out strange), but the early outputs are framed as unusually strong compared with many other “out of the gate” releases.

DreamBooth is also being folded into consumer-facing apps, including an iPhone app that offers DreamBooth-style personalization for free, though with limitations: users can generate images of selected famous figures rather than creating fully custom identities. The examples shown range from plausible results to clearly uncanny failures, underscoring both the speed of iteration and the unevenness of quality.

Pricing and platform shifts round out the roundup. Playground AI changes how it charges for DALL·E 2 access—turning it into a paid add-on rather than bundling it under a higher tier—while Crayon (an image model previously popular on the platform) receives an update enabling higher-resolution 1024×1024 outputs. Runway ML is highlighted for video-centric tools like infinite image (outpainting-like expansion), image-to-image transformations, and inpainting, with a caveat that free usage is limited by project caps and output resolution.

Finally, OpenAI’s DALL·E 2 API is described as now available, enabling other companies to integrate DALL·E 2 into their own products for a fee. The segment closes with Google’s AI Test Kitchen—an iOS/Android app in a waitlist phase—positioned as a testing ground for Google’s upcoming image generation capabilities, with Google’s Lambda mentioned as the current text model available there.

Cornell Notes

Midjourney V4 is portrayed as a step up in prompt adherence and image coherence, sometimes outperforming DALL·E 2 in both “understanding” and artistic consistency. That improvement is being replicated through DreamBooth workflows: Google’s DreamBooth is used to train Stable Diffusion so it can mimic Midjourney V4’s style, with a free Colab option offered for experimentation. Lexica’s ecosystem is also driving independent model development, where a Lexica Art creator’s model is praised for unusually strong face and skin texture fidelity. Meanwhile, platforms like Playground AI and Runway ML are reshaping pricing and adding features such as higher-resolution generation and infinite image/outpainting-style editing. OpenAI’s DALL·E 2 API and Google’s AI Test Kitchen round out the shift toward easier integration and broader access.

Why is Midjourney V4 getting framed as a coherence and prompt-following winner?

The discussion centers on how well Midjourney V4 merges distinct prompt concepts into a single believable scene—using the “Darth Vader toilet” example to illustrate that the model can combine character identity and object identity in a way that looks internally consistent. It’s also described as more affordable than DALL·E 2, with the caveat that it still produces errors in complex spatial details (e.g., how an object should be tucked or positioned).

How does DreamBooth connect Midjourney V4 style to Stable Diffusion outputs?

DreamBooth is described as open-source software that can be applied to text-to-image models like Stable Diffusion. The workflow highlighted trains Stable Diffusion on images intended to reflect Midjourney V4’s “essence,” so the resulting generations inherit a similar look. Sample images are presented as evidence that Stable Diffusion can “leech off” Midjourney V4 style while increasing perceived quality, and the process is offered via a free Google Colab setup.

What makes the Lexica-linked model outputs stand out in the conversation?

The standout claims are coherence and realism, especially around faces and skin texture—areas that often fail in text-to-image generation. The model is said to produce high-resolution-looking results and to be accurate enough that viewers struggle to tell whether some faces are real without zooming into fine details. It still shows prompt-related oddities (like strange collar/double-collar artifacts), but the overall fidelity is compared favorably to many other models.

What ethical concern is raised about training on Midjourney images, and how is it addressed?

The concern is whether it’s ethical to train a new model using Midjourney’s images, given Midjourney is a separate company with its own model. The response in the discussion leans on the idea that learning from existing work is part of improving systems—framing it as analogous to how archery improvement involves learning from prior techniques rather than treating all cross-training as inherently unethical.

How are Playground AI and Runway ML changing the user experience?

Playground AI is described as altering pricing so DALL·E 2 becomes a $10 add-on rather than bundled under a higher-priced Pro tier, and Crayon gains higher-resolution 1024×1024 generation. Runway ML is framed as expanding video-focused editing capabilities: green/blur background tools, inpainting, and “infinite image” (outpainting-like expansion) plus image-to-image transformations. Free usage is described as limited by a project cap and output resolution (e.g., 720p), even if generation volume is otherwise flexible.

What new access points are mentioned for DALL·E 2 and Google’s image generation?

OpenAI’s DALL·E 2 is said to have its own API, allowing other companies to integrate DALL·E 2 into their products for a fee. Separately, Google’s AI Test Kitchen is described as an iOS/Android app requiring a waitlist, with invites going out quickly; it currently includes Google’s Lambda (text) and is expected to add Google Imagen for image generation later.

Review Questions

  1. Which specific capabilities (prompt adherence, coherence, face texture) are used as evidence for Midjourney V4’s perceived lead, and where does it still fail?
  2. How does DreamBooth training on Midjourney V4-style images change Stable Diffusion outputs, and what practical setup is offered to try it?
  3. Compare the strengths and limitations attributed to the Lexica-linked model versus Runway ML’s infinite image/outpainting approach.

Key Points

  1. 1

    Midjourney V4 is credited with stronger prompt-following and scene coherence than DALL·E 2, though it still struggles with some complex spatial details.

  2. 2

    DreamBooth workflows are being used to train Stable Diffusion to mimic Midjourney V4’s style, with a free Google Colab option presented for experimentation.

  3. 3

    Lexica’s prompt/image ecosystem is linked to independent model training, where unusually strong face and skin texture fidelity is highlighted.

  4. 4

    Consumer apps are starting to offer DreamBooth-style generation for free, but often with constraints like generating from a fixed set of famous identities.

  5. 5

    Playground AI is changing DALL·E 2 access pricing (DALL·E 2 as a $10 add-on) while adding higher-resolution 1024×1024 generation for Crayon.

  6. 6

    Runway ML is expanding video editing features such as infinite image (outpainting-like expansion), image-to-image, and inpainting, with free tiers limited by project caps and resolution.

  7. 7

    OpenAI’s DALL·E 2 API and Google’s AI Test Kitchen (with Lambda now and Imagen expected) signal broader integration and access across platforms.

Highlights

Midjourney V4 is repeatedly judged on whether it can merge unrelated prompt elements into a single believable scene—like “Darth Vader toilet”—more consistently than DALL·E 2.
DreamBooth is positioned as a bridge that lets Stable Diffusion adopt a Midjourney-like look by training on Midjourney V4 outputs.
The Lexica-linked model is praised for face realism and skin texture, with viewers needing fine-detail checks to confirm whether images look “real.”
Runway ML’s “infinite image” is described as an outpainting-style editor for expanding images while keeping the subject consistent.
OpenAI’s DALL·E 2 API and Google’s AI Test Kitchen point to faster adoption of image generation through apps and third-party integrations.

Topics

  • Midjourney V4
  • DreamBooth
  • Stable Diffusion
  • Lexica
  • Runway ML
  • Playground AI
  • DALL·E 2 API
  • AI Test Kitchen