Midjourney Surpasses DALL-E 2 - Incredible Midjourney V4 Upgrade

TL;DR

Midjourney V4 is presented as a substantial improvement over Midjourney V3 in prompt coherence and background integration.

Briefing Cornell Notes

Briefing

Midjourney V4’s public release is being treated as a real turning point: with the same prompts used to benchmark earlier models, V4 produces more coherent, more “art-directed” images than Midjourney V3 and often matches or beats DALL·E 2 on detail and prompt fidelity. The practical takeaway is simple—people who previously chose DALL·E 2 for consistency and Midjourney for style now have a stronger case that Midjourney V4 can deliver both, at least across a range of common prompt types.

A core comparison centers on a straightforward test: “penguin in Venice.” Midjourney V3 already renders the scene with recognizable elements, but V4 adds noticeably sharper structure and clearer environmental cues—water, boats, buildings, and color palette—while keeping the subject coherent. DALL·E 2, by contrast, is described as leaning more toward photo realism, producing images that can look more like actual photographs, but with different artistic interpretations of the same prompt. The result is framed as subjective: DALL·E 2 may win for “photographic” output, while Midjourney V4 is credited with higher artistic coherence and stronger background integration.

The transcript then shifts from one-off tests to a broader “through-the-ringer” set of prompts. A “lemon wearing sunglasses relaxing on the beach” prompt is used to show that Midjourney V4 can generate more cohesive character styling and background integration than Midjourney V3, while DALL·E 2’s outputs are described as more photographic but sometimes less detailed or more “mushy.” For a “character concept old Mage warlock” prompt, Midjourney V4 is portrayed as adding richer character detail from a short prompt—more expressive faces, stronger visual identity, and clearer magical elements—where DALL·E 2 is characterized as producing simpler, clip-art-like results unless more prompt work is done.

Content-policy differences also enter the comparison. When generating a detailed portrait depicting Walter White from Breaking Bad, DALL·E 2 is said to be constrained by policy, leading to vague resemblance, while Midjourney is described as producing a more recognizable likeness (wrinkles, hair color, and overall facial features). Another OpenAI-style benchmark—an armchair in the shape of an avocado—serves as a creativity and texture test, where Midjourney is credited with more detailed, layered avocado skin and pit features, and with upscaled results that retain the concept more convincingly.

Across more complex scenes—like a Shih Tzu puppy dressed as a pirate sailing on a pirate ship, an iPhone selfie of Bigfoot in the Loch Ness monster playing video games, and various logo-style prompts—Midjourney V4 is repeatedly described as producing clearer faces, better environmental detail, and more consistent concept execution. The Bigfoot “iPhone selfie” prompt is a notable exception where neither model performs perfectly, but Midjourney is still framed as producing more coherent results overall.

By the end, the verdict is blunt: Midjourney V4 is portrayed as head-to-head with DALL·E 2, with the added argument that Midjourney’s improvements over V3 are large enough to justify switching or at least testing again. Cost is also mentioned as a factor in the broader “fight” between the two systems, with Midjourney positioned as the cheaper option while closing the quality gap.

Cornell Notes

Midjourney V4’s release is presented as a major quality jump over Midjourney V3, especially for prompt coherence and artistic detail. Using repeated benchmarks with the same prompts, V4 is credited with clearer subject-background integration (e.g., “penguin in Venice”) and richer character concepts (e.g., “old Mage warlock”) even from short prompts. DALL·E 2 is still described as strong for photo-realistic output, but it’s portrayed as less detailed or more “mushy” in several side-by-side tests. Content policy constraints are also highlighted: DALL·E 2 is said to struggle with generating a Walter White portrait, while Midjourney produces a more recognizable likeness. Overall, the comparison lands on Midjourney V4 being competitive with DALL·E 2, with cost mentioned as an extra advantage.

What benchmark prompt best illustrates the coherence jump from Midjourney V3 to V4?

The “penguin in Venice” prompt. Midjourney V3 is described as already producing a recognizable scene, but V4 adds stronger structural detail and clearer environmental cues—water, boats, styled buildings, and even color palette—while keeping the penguin and setting consistently readable. DALL·E 2 is portrayed as more photo-realistic, but with different interpretations, making the “winner” partly subjective.

How does the comparison treat “artistic coherence” versus “photo realism”?

Midjourney V4 is repeatedly credited with being more artistically focused while still maintaining realism cues, leading to images that feel more integrated (subject plus background). DALL·E 2 is framed as producing more photographic results, but sometimes with less detailed or less crisp concept execution—described as mushy or less coherent in certain tests. The transcript treats this as preference: some viewers may prefer DALL·E 2’s photographic look, while others prefer Midjourney’s more cohesive art direction.

Why does the Walter White test produce a different outcome for DALL·E 2 and Midjourney?

The transcript claims DALL·E 2’s content policy limits knowledge or generation of Walter White from Breaking Bad, resulting in outputs that only vaguely resemble him. Midjourney is described as not facing the same restriction, producing a detailed portrait with recognizable features like signature wrinkles and hair color. This is used to argue that policy constraints can affect benchmark fairness.

What does the “armchair in the shape of an avocado” prompt reveal about creativity and texture?

It’s used as a concept-combination benchmark. Midjourney is credited with more detailed avocado textures—skin, internal layers, and the avocado pit—plus reflections and “oiliness” cues that match avocado oil. The transcript claims that even the best DALL·E 2 result doesn’t match the detail and coherence seen in Midjourney’s armchair-avocado outputs, including after upscaling.

How do the models compare on short prompt character concepts like “old Mage warlock”?

Midjourney V4 is portrayed as adding substantial creative detail from a short prompt: distinct character traits, richer faces, and clearer magical elements. DALL·E 2 outputs are described as simpler and more “clip art”-like unless more prompt specificity is provided. The transcript also notes limited variation across Midjourney’s generated warlocks (similar cloak, skin tone, body type), suggesting strong quality but less exploration of the latent space.

Which prompt category is used to test logo-style outputs?

App/logo prompts for frog photographers. The transcript says DALL·E 2 often produces generic frog app logos, while Midjourney produces more detailed, close-up, and sometimes more scene-like logo concepts (e.g., frogs with cameras). The conclusion is that Midjourney tends to produce cooler, more specific logo imagery in these tests.

Review Questions

In the “penguin in Venice” comparison, what specific visual elements are cited as improving from V3 to V4?
How does content policy influence the Walter White portrait results, and why does that matter for interpreting benchmark outcomes?
Across the lemon and warlock prompts, what pattern suggests Midjourney V4’s strengths with short prompts?

Key Points

1
Midjourney V4 is presented as a substantial improvement over Midjourney V3 in prompt coherence and background integration.
2
The “penguin in Venice” test highlights V4’s clearer environmental structure (water, boats, buildings) while keeping the subject consistent.
3
DALL·E 2 is repeatedly characterized as stronger for photo-realistic output, but sometimes less detailed or less cohesive in side-by-side comparisons.
4
Short, concept-heavy prompts (like “old Mage warlock”) tend to produce richer, more characterful results in Midjourney V4 than in DALL·E 2.
5
Content policy constraints can materially affect benchmark comparisons, illustrated by the Walter White portrait test.
6
Midjourney V4 is described as competitive with DALL·E 2 across multiple prompt types, including concept mashups and logo-style prompts.
7
Cost is mentioned as an additional reason many users may prefer to try Midjourney V4 even if DALL·E 2 remains a strong option.

Highlights

Midjourney V4’s “penguin in Venice” output is credited with noticeably higher coherence than Midjourney V3, keeping Venice-defining details readable and consistent.

The Walter White portrait test is used to show how policy constraints can limit DALL·E 2’s ability to generate recognizable likenesses.

In the armchair-avocado benchmark, Midjourney is credited with more convincing texture detail—skin, pit, and reflective cues—especially after upscaling.

Across character concepts and logo prompts, Midjourney V4 is repeatedly framed as producing more detailed, concept-faithful results than DALL·E 2, though DALL·E 2 can still win on “photographic” style.

Topics

Image Generation Benchmarks
Midjourney V4
DALL·E 2 Comparison
Prompt Coherence
Content Policy Effects