ChatGPT 4 Learns to Use Midjourney - It Mastered Promptcrafting! (AI EXPIRIMENTS)
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
GPT-4 can automate Midjourney promptcraft by outputting a ready-to-run “/imagine prompt” command with Midjourney V5 parameters.
Briefing
A new workflow pairs OpenAI’s GPT-4 with Midjourney by having GPT-4 generate fully formed “/imagine” prompts—including Midjourney parameters—so users can start with a single word or a messy natural-language request and still get a usable image prompt. The experiment matters because it targets one of the biggest friction points in image generation: promptcraft. Instead of learning Midjourney’s syntax and parameter system, a user can describe an idea in plain language while GPT-4 handles the translation into something Midjourney can reliably run.
The setup uses GPT-4 in OpenAI’s Playground with a system prompt that defines GPT-4 as an “AI image prompt generator” for Midjourney. Crucially, the system prompt includes Midjourney documentation and instructs GPT-4 to output a working Midjourney command using “/imagine prompt,” plus a small set of supported parameters for Midjourney V5—most notably aspect ratio (“AR”) and stylization (“S2”). Early tests show GPT-4 can correctly produce prompts that Midjourney executes: a “Whimsical cat” request becomes a “/imagine prompt” line with the right structure and parameters, and a single-word “cows” request turns into a pasture scene prompt that includes “dash dash AR 16x9” and “dash dash S2.” The results suggest GPT-4 is good at producing coherent, parameterized prompts even when the input is minimal.
As the natural-language inputs get more specific—or more “human” in their phrasing—the quality becomes uneven. When asked for a “goldfish” that is so large it fills the tank, GPT-4 produces prompts that yield photorealistic goldfish images, but Midjourney struggles with the exact spatial intent: the goldfish sometimes appears to overwhelm or distort the aquarium rather than cleanly matching the “single goldfish” concept. Similar issues show up when GPT-4 generates long, highly detailed prompts for complex scenes. A “cinematic movie scene” prompt intended for a confrontation produces cyberpunk-like imagery that doesn’t fully align with the intended narrative beats. A “loneliness” request works better, producing a solitary tree in an overcast field with muted colors.
The experiment also probes Midjourney’s limits with character mixing and constraints. A prompt combining “Mario” as a villain yields multiple strong results, including a menacing, “evil Mario” depiction with flames and a stormy dystopian setting. But a fruit-character prompt (“anthropomorphic lemon character” relaxing on a beach with a VR headset) shows character confusion: the pineapple companion and lemon elements blend rather than staying distinct. When GPT-4 invents a “never before seen color,” Midjourney returns visually interesting, swirling color imagery, though it effectively treats the request as an artistic description rather than a literal new hue.
Overall, the workflow demonstrates that GPT-4 can automate a large portion of Midjourney promptcraft—especially the syntax and parameter scaffolding—while Midjourney still enforces practical constraints on how precisely it can follow intricate, multi-object, or highly specific instructions. The most reliable gains come from simpler, single-subject prompts; the biggest failures cluster around overstuffed prompts and tightly specified composition requirements.
Cornell Notes
GPT-4 can be used as a Midjourney prompt generator: it takes a user’s plain-language idea and outputs a ready-to-run “/imagine prompt” plus Midjourney V5 parameters like aspect ratio (AR) and stylization (S2). In tests, even one-word inputs such as “cows” become structured prompts that Midjourney can execute, producing coherent scenes. As requests become more complex—multiple characters, strict spatial intent, or long cinematic narratives—Midjourney sometimes misinterprets details, causing character mixing or composition drift. The workflow is most effective for promptcraft automation, but Midjourney’s own limitations still cap how precisely it can follow intricate instructions.
How does GPT-4 turn a simple idea into a Midjourney-ready command?
What kinds of inputs produce the most reliable results?
Where does the workflow break down?
How does prompt length and detail affect outcomes?
What do the Mario and “never before seen color” experiments reveal?
Review Questions
- When GPT-4 generates Midjourney prompts, which parameters are emphasized in the system setup, and how do they appear in the final “/imagine” output?
- Give one example where Midjourney misread a GPT-4 prompt detail (goldfish scale, character mixing, or cinematic narrative). What was the mismatch?
- Why might longer, “advanced” prompts lead to worse fidelity in multi-object scenes? Use one of the character-combination examples to support your answer.
Key Points
- 1
GPT-4 can automate Midjourney promptcraft by outputting a ready-to-run “/imagine prompt” command with Midjourney V5 parameters.
- 2
Including Midjourney documentation inside GPT-4’s system prompt helps it format prompts correctly and consistently.
- 3
Single-subject, simpler requests (like “cows” or a whimsical cat) produce more reliable, coherent images.
- 4
Highly specific spatial instructions (like a goldfish filling the tank) can be interpreted loosely, leading to composition drift or unintended distortion.
- 5
Overstuffed, multi-character, or long “cinematic” prompts increase the likelihood of character confusion and narrative mismatch.
- 6
Midjourney can still deliver strong results when the creative transformation is clear, as in “Mario as a villain.”
- 7
Even when GPT-4 invents abstract concepts (like a “never before seen color”), Midjourney tends to render them as artistic descriptions rather than literal new physics of color.