ChatGPT 4 Learns to Use Midjourney - It Mastered Promptcrafting! (AI EXPIRIMENTS)

TL;DR

GPT-4 can automate Midjourney promptcraft by outputting a ready-to-run “/imagine prompt” command with Midjourney V5 parameters.

Briefing Cornell Notes

Briefing

A new workflow pairs OpenAI’s GPT-4 with Midjourney by having GPT-4 generate fully formed “/imagine” prompts—including Midjourney parameters—so users can start with a single word or a messy natural-language request and still get a usable image prompt. The experiment matters because it targets one of the biggest friction points in image generation: promptcraft. Instead of learning Midjourney’s syntax and parameter system, a user can describe an idea in plain language while GPT-4 handles the translation into something Midjourney can reliably run.

The setup uses GPT-4 in OpenAI’s Playground with a system prompt that defines GPT-4 as an “AI image prompt generator” for Midjourney. Crucially, the system prompt includes Midjourney documentation and instructs GPT-4 to output a working Midjourney command using “/imagine prompt,” plus a small set of supported parameters for Midjourney V5—most notably aspect ratio (“AR”) and stylization (“S2”). Early tests show GPT-4 can correctly produce prompts that Midjourney executes: a “Whimsical cat” request becomes a “/imagine prompt” line with the right structure and parameters, and a single-word “cows” request turns into a pasture scene prompt that includes “dash dash AR 16x9” and “dash dash S2.” The results suggest GPT-4 is good at producing coherent, parameterized prompts even when the input is minimal.

As the natural-language inputs get more specific—or more “human” in their phrasing—the quality becomes uneven. When asked for a “goldfish” that is so large it fills the tank, GPT-4 produces prompts that yield photorealistic goldfish images, but Midjourney struggles with the exact spatial intent: the goldfish sometimes appears to overwhelm or distort the aquarium rather than cleanly matching the “single goldfish” concept. Similar issues show up when GPT-4 generates long, highly detailed prompts for complex scenes. A “cinematic movie scene” prompt intended for a confrontation produces cyberpunk-like imagery that doesn’t fully align with the intended narrative beats. A “loneliness” request works better, producing a solitary tree in an overcast field with muted colors.

The experiment also probes Midjourney’s limits with character mixing and constraints. A prompt combining “Mario” as a villain yields multiple strong results, including a menacing, “evil Mario” depiction with flames and a stormy dystopian setting. But a fruit-character prompt (“anthropomorphic lemon character” relaxing on a beach with a VR headset) shows character confusion: the pineapple companion and lemon elements blend rather than staying distinct. When GPT-4 invents a “never before seen color,” Midjourney returns visually interesting, swirling color imagery, though it effectively treats the request as an artistic description rather than a literal new hue.

Overall, the workflow demonstrates that GPT-4 can automate a large portion of Midjourney promptcraft—especially the syntax and parameter scaffolding—while Midjourney still enforces practical constraints on how precisely it can follow intricate, multi-object, or highly specific instructions. The most reliable gains come from simpler, single-subject prompts; the biggest failures cluster around overstuffed prompts and tightly specified composition requirements.

Cornell Notes

GPT-4 can be used as a Midjourney prompt generator: it takes a user’s plain-language idea and outputs a ready-to-run “/imagine prompt” plus Midjourney V5 parameters like aspect ratio (AR) and stylization (S2). In tests, even one-word inputs such as “cows” become structured prompts that Midjourney can execute, producing coherent scenes. As requests become more complex—multiple characters, strict spatial intent, or long cinematic narratives—Midjourney sometimes misinterprets details, causing character mixing or composition drift. The workflow is most effective for promptcraft automation, but Midjourney’s own limitations still cap how precisely it can follow intricate instructions.

How does GPT-4 turn a simple idea into a Midjourney-ready command?

The system prompt instructs GPT-4 to output a fully working Midjourney prompt in the form of “/imagine prompt …” and to include Midjourney V5 parameters. The experiment explicitly uses parameters such as “dash dash AR 16x9” and “dash dash S2,” and it pastes Midjourney documentation into the system prompt so GPT-4 knows how to format the “/imagine” command. For example, the input “cows” becomes a structured pasture scene prompt with the correct Midjourney parameter syntax.

What kinds of inputs produce the most reliable results?

Short, single-concept requests tend to work best. A “Whimsical cat” prompt yields a cat floating among glowing stars, and “cows” produces a coherent pastoral scene. Even when the user input is informal or minimal, GPT-4’s parameterized structure helps Midjourney generate images that match the general subject and style direction.

Where does the workflow break down?

When the request requires precise composition, strict object separation, or complex narrative staging, Midjourney can drift. The goldfish test shows this: GPT-4 generates prompts for an enormous goldfish filling the tank, but Midjourney sometimes interprets the scale in a way that distorts the aquarium or fails to keep the “single goldfish” intent clean. Character-heavy prompts also cause confusion, such as the lemon-and-pineapple beach scene where the pineapple and lemon elements blend.

How does prompt length and detail affect outcomes?

More detail often increases the chance of misinterpretation. GPT-4 can produce very long, “advanced” prompts for cinematic scenes or multi-character illustrations, but Midjourney may struggle to keep every element straight. The Sherlock Holmes + Alice in Wonderland + Gandalf dancing prompt illustrates this: the characters get mixed up, with Gandalf appearing where Alice or Sherlock should be.

What do the Mario and “never before seen color” experiments reveal?

Mario-as-villain shows that GPT-4 can successfully convey a strong character transformation when the concept is clear: the results include a dark, menacing Mario with flames and a stormy dystopian Mushroom Kingdom vibe. The “never before seen color” test shows Midjourney’s tendency to treat such requests as artistic color descriptions—GPT-4 outputs a prompt describing swirling, iridescent cosmic hues, and Midjourney returns visually striking abstract color imagery rather than a literal new-spectrum color.

Review Questions

When GPT-4 generates Midjourney prompts, which parameters are emphasized in the system setup, and how do they appear in the final “/imagine” output?
Give one example where Midjourney misread a GPT-4 prompt detail (goldfish scale, character mixing, or cinematic narrative). What was the mismatch?
Why might longer, “advanced” prompts lead to worse fidelity in multi-object scenes? Use one of the character-combination examples to support your answer.

Key Points

1
GPT-4 can automate Midjourney promptcraft by outputting a ready-to-run “/imagine prompt” command with Midjourney V5 parameters.
2
Including Midjourney documentation inside GPT-4’s system prompt helps it format prompts correctly and consistently.
3
Single-subject, simpler requests (like “cows” or a whimsical cat) produce more reliable, coherent images.
4
Highly specific spatial instructions (like a goldfish filling the tank) can be interpreted loosely, leading to composition drift or unintended distortion.
5
Overstuffed, multi-character, or long “cinematic” prompts increase the likelihood of character confusion and narrative mismatch.
6
Midjourney can still deliver strong results when the creative transformation is clear, as in “Mario as a villain.”
7
Even when GPT-4 invents abstract concepts (like a “never before seen color”), Midjourney tends to render them as artistic descriptions rather than literal new physics of color.

Highlights

A one-word input (“cows”) becomes a structured Midjourney V5 prompt with “dash dash AR 16x9” and “dash dash S2,” letting users skip syntax learning.

Character-heavy prompts often collapse into mixed identities—Alice, Sherlock, and Gandalf get scrambled when the prompt gets too packed.

The goldfish test shows that “scale” instructions can be followed visually but not always with the exact intent (single goldfish vs. overwhelming aquarium distortion).

“Mario as the villain” produces some of the strongest results, suggesting GPT-4 works best when the transformation is unambiguous.

Topics

Mentioned

GPT-4
AR
S2