NEW DALL-E 2 Prompt Strategies for Text to Image AI
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Write prompts with concrete details about medium, camera, and subject placement to reduce ambiguity and improve scene fidelity.
Briefing
Text-to-image results improve dramatically when prompts are treated like precise design briefs rather than casual sentences. The most practical takeaway is that long, detailed prompts—down to medium, camera, subject specifics, and even punctuation—help models lock onto the intended scene. A simple example (a cat wearing a sombrero riding a donkey in a desert) becomes far more controllable when the prompt specifies a “vintage photo,” a “Pentax k1000,” and where the sombrero sits. Small tweaks also matter: moving key words earlier in the prompt and removing confusing fragments can shift outputs from muddled interpretations (like the model accidentally turning the cat into the donkey) to clearer, more usable imagery.
A second strategy focuses on disambiguating brand and style references. “Designed by Apple” can trigger the fruit instead of the company, so the prompt needs guardrails such as “Apple Inc” or additional context like “product photo” and “concept.” Adding “other minimalist companies” helps steer the model toward the intended corporate design language. The payoff is a more consistent aesthetic: minimalist, product-like controller concepts with the kind of silvery-white palette associated with Apple hardware.
Logo generation is treated as a third, almost accidental superpower. Even prompts that aren’t about logos—like minimalist controller concepts—often get converted into logo design concepts automatically. The results can be surprisingly usable as starting points: a controller-inspired Apple-like logo, a dinosaur dating-site logo, a sewer-monster logo, and even logo ideas for niche brands and social profiles. The underlying lesson is that text-to-image systems can reframe a request into a graphic design format with minimal prompting.
Beyond creation, the transcript highlights editing workflows that make Dolly 2 function like a lightweight Photoshop alternative. Using Dolly 2’s edit/inpainting tools, users can erase part of an image and supply a replacement prompt—turning a phone into “Darth Vader holding a banana,” for instance. The same approach can be applied to personal images (with constraints noted in the transcript, such as avoiding photorealistic people) to generate stylized character elements like a “cool lemon character” with sunglasses and bright teeth.
Finally, Dolly 2’s inpainting is framed as a way to complete unfinished artwork. Artists can submit a partially drawn piece, erase the problematic section (a hand, face, or body area), and ask the model to extend the image while matching the existing style. Experiments include completing a picnic scene, fixing a hand area that repeatedly turned into a phone-holding figure, and refining larger erased regions like legs and dress details. The consistent theme: when the erased area is small and the prompt is specific enough, Dolly 2 can produce coherent continuations that artists can trace, sketch over, or use as compositional references—turning stuck moments into workable drafts.
Cornell Notes
The transcript lays out five prompt strategies for text-to-image systems, with Dolly 2 as the main focus. The biggest improvement comes from writing long, specific prompts: include medium (e.g., “vintage photo”), camera details (e.g., “Pentax k1000”), subject attributes, and even punctuation or sentence structure. Brand/style references need disambiguation—“Designed by Apple Inc” and “product photo concept” steer the model away from the fruit and toward the company aesthetic. Dolly 2 also supports Photoshop-like editing through inpainting: erase part of an image and prompt what should replace it. That same inpainting capability can complete unfinished artwork by matching the existing style, colors, and composition.
Why do longer prompts tend to produce better text-to-image results?
How should prompts handle brand names to avoid unintended interpretations?
What makes logo generation different from general image generation?
How does Dolly 2 enable “Photoshop-like” editing?
How can inpainting help when an artist is stuck on a specific part of a drawing?
Review Questions
- Give an example of how you would rewrite a short prompt into a long, detailed one to control medium, camera style, and subject placement.
- What changes would you make to a prompt to ensure “Apple” refers to the company rather than the fruit?
- Describe an inpainting workflow for editing an image: what gets erased and what kind of prompt should replace it?
Key Points
- 1
Write prompts with concrete details about medium, camera, and subject placement to reduce ambiguity and improve scene fidelity.
- 2
Reorder prompts so the most important descriptors appear earlier, increasing the chance the model prioritizes them.
- 3
Disambiguate brand references using company names like “Apple Inc” and add context such as “product photo concept” to steer style.
- 4
Treat logo generation as a strong default capability; even non-logo prompts can yield logo-style outputs that work as starting points.
- 5
Use Dolly 2’s edit/inpainting tools by erasing a selected region and prompting what should replace it, enabling Photoshop-like transformations.
- 6
For unfinished art, erase only the problematic area and use specific prompts that encourage style and color matching rather than vague descriptions that can cause repeated unintended elements.