Get AI summaries of any video or article — Sign up free
NEW DALL-E 2 Prompt Strategies for Text to Image AI thumbnail

NEW DALL-E 2 Prompt Strategies for Text to Image AI

MattVidPro·
5 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Write prompts with concrete details about medium, camera, and subject placement to reduce ambiguity and improve scene fidelity.

Briefing

Text-to-image results improve dramatically when prompts are treated like precise design briefs rather than casual sentences. The most practical takeaway is that long, detailed prompts—down to medium, camera, subject specifics, and even punctuation—help models lock onto the intended scene. A simple example (a cat wearing a sombrero riding a donkey in a desert) becomes far more controllable when the prompt specifies a “vintage photo,” a “Pentax k1000,” and where the sombrero sits. Small tweaks also matter: moving key words earlier in the prompt and removing confusing fragments can shift outputs from muddled interpretations (like the model accidentally turning the cat into the donkey) to clearer, more usable imagery.

A second strategy focuses on disambiguating brand and style references. “Designed by Apple” can trigger the fruit instead of the company, so the prompt needs guardrails such as “Apple Inc” or additional context like “product photo” and “concept.” Adding “other minimalist companies” helps steer the model toward the intended corporate design language. The payoff is a more consistent aesthetic: minimalist, product-like controller concepts with the kind of silvery-white palette associated with Apple hardware.

Logo generation is treated as a third, almost accidental superpower. Even prompts that aren’t about logos—like minimalist controller concepts—often get converted into logo design concepts automatically. The results can be surprisingly usable as starting points: a controller-inspired Apple-like logo, a dinosaur dating-site logo, a sewer-monster logo, and even logo ideas for niche brands and social profiles. The underlying lesson is that text-to-image systems can reframe a request into a graphic design format with minimal prompting.

Beyond creation, the transcript highlights editing workflows that make Dolly 2 function like a lightweight Photoshop alternative. Using Dolly 2’s edit/inpainting tools, users can erase part of an image and supply a replacement prompt—turning a phone into “Darth Vader holding a banana,” for instance. The same approach can be applied to personal images (with constraints noted in the transcript, such as avoiding photorealistic people) to generate stylized character elements like a “cool lemon character” with sunglasses and bright teeth.

Finally, Dolly 2’s inpainting is framed as a way to complete unfinished artwork. Artists can submit a partially drawn piece, erase the problematic section (a hand, face, or body area), and ask the model to extend the image while matching the existing style. Experiments include completing a picnic scene, fixing a hand area that repeatedly turned into a phone-holding figure, and refining larger erased regions like legs and dress details. The consistent theme: when the erased area is small and the prompt is specific enough, Dolly 2 can produce coherent continuations that artists can trace, sketch over, or use as compositional references—turning stuck moments into workable drafts.

Cornell Notes

The transcript lays out five prompt strategies for text-to-image systems, with Dolly 2 as the main focus. The biggest improvement comes from writing long, specific prompts: include medium (e.g., “vintage photo”), camera details (e.g., “Pentax k1000”), subject attributes, and even punctuation or sentence structure. Brand/style references need disambiguation—“Designed by Apple Inc” and “product photo concept” steer the model away from the fruit and toward the company aesthetic. Dolly 2 also supports Photoshop-like editing through inpainting: erase part of an image and prompt what should replace it. That same inpainting capability can complete unfinished artwork by matching the existing style, colors, and composition.

Why do longer prompts tend to produce better text-to-image results?

Longer prompts reduce ambiguity about medium and composition. In the example, “cat wearing a sombrero riding on top of a donkey in the middle of the desert” becomes more controllable when the prompt adds “photo taken on a Pentax k1000,” specifies the sombrero placement (“on his head”), and frames it as “vintage photo.” The transcript also notes that prompt structure can matter: placing key words earlier in the prompt often increases the chance the model prioritizes them, and punctuation (commas/periods) may help break up phrases for some generators.

How should prompts handle brand names to avoid unintended interpretations?

Brand terms can be misread as everyday objects. “Designed by Apple” risks producing an apple-fruit controller, so the prompt should clarify the company using “Apple Inc.” The transcript pairs this with additional context like “product photo” and “concept,” which helps the model treat the output as a product design rather than a literal fruit-themed item. Adding “other minimalist companies” further supports the intended corporate minimalist style.

What makes logo generation different from general image generation?

Logo-focused outputs can emerge even when the prompt isn’t explicitly about logos. The transcript shows that a minimalist controller concept can quickly transform into a logo design concept, producing a controller-shaped Apple-like mark. Similar behavior appears with other niche ideas: a dinosaur dating-site logo, a sewer-monster logo, and social-profile logo concepts. The practical takeaway is to treat logo generation as a strong default mode for these models and use it as a starting point.

How does Dolly 2 enable “Photoshop-like” editing?

Dolly 2 can edit existing images via an inpainting workflow. The user generates an image, clicks edit, highlights an area to erase, and then provides a replacement prompt. One example erases a phone and replaces it with “Darth Vader holding a banana,” producing a new version of the image where the erased region is filled in according to the prompt. The transcript also emphasizes that uploads have constraints (e.g., avoiding photorealistic people) while still demonstrating stylized edits like turning a lemon image into a “lemon character” with sunglasses and bright teeth.

How can inpainting help when an artist is stuck on a specific part of a drawing?

Inpainting can complete missing or weak sections while attempting to match the original style. The transcript describes erasing problematic areas—like a hand, face, or body region—and prompting for a continuation (e.g., completing legs and dress details). It also shows failure modes: a vague prompt like “self-portrait” repeatedly led to the depicted person holding a phone, even though the prompt didn’t specify that. More targeted prompts (e.g., “pair of legs”) produced better style-consistent completions.

Review Questions

  1. Give an example of how you would rewrite a short prompt into a long, detailed one to control medium, camera style, and subject placement.
  2. What changes would you make to a prompt to ensure “Apple” refers to the company rather than the fruit?
  3. Describe an inpainting workflow for editing an image: what gets erased and what kind of prompt should replace it?

Key Points

  1. 1

    Write prompts with concrete details about medium, camera, and subject placement to reduce ambiguity and improve scene fidelity.

  2. 2

    Reorder prompts so the most important descriptors appear earlier, increasing the chance the model prioritizes them.

  3. 3

    Disambiguate brand references using company names like “Apple Inc” and add context such as “product photo concept” to steer style.

  4. 4

    Treat logo generation as a strong default capability; even non-logo prompts can yield logo-style outputs that work as starting points.

  5. 5

    Use Dolly 2’s edit/inpainting tools by erasing a selected region and prompting what should replace it, enabling Photoshop-like transformations.

  6. 6

    For unfinished art, erase only the problematic area and use specific prompts that encourage style and color matching rather than vague descriptions that can cause repeated unintended elements.

Highlights

Long, detailed prompts—down to camera and medium—can turn a confusing scene into a clearer, more controllable image.
“Designed by Apple” can drift toward the fruit; “Designed by Apple Inc” plus “product photo concept” helps lock onto the intended company aesthetic.
Dolly 2’s inpainting lets users erase part of an image and replace it with a new concept, such as “Darth Vader holding a banana.”
Inpainting can complete unfinished artwork, but vague prompts (like “self-portrait”) may introduce recurring, unintended objects (like a phone).

Topics