Get AI summaries of any video or article — Sign up free
I Got DALL-E 3 Access: My First Impressions and Tests thumbnail

I Got DALL-E 3 Access: My First Impressions and Tests

All About AI·
5 min read

Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Iterating from a selected prior image can preserve the subject while changing environment and outfit, but photoreal style can drift toward painterly backdrops.

Briefing

Access to DALL·E 3 quickly turns into a practical test of whether natural-language prompting can reliably produce photoreal images, iterate on specific subjects, and handle “design tasks” like memes, illustrations, blueprints, and thumbnail-style text layouts. The standout takeaway: iterative refinement works smoothly—especially when prompts reference an existing image—while DALL·E 3’s strengths often show up in workflows that feel more conversational than traditional text-to-image prompting.

The first set of experiments focuses on a professional-style portrait: a “stunning American born female” in a 16:9 frame. The initial results look realistic enough to feel closer to photoreal than typical stylized generations, and the user then tries image-based iteration by selecting “image B” and placing the same woman into new scenes. Moving her onto New York streets with a different outfit produces a convincing match in subject and mood, with a notable bokeh-style background. A later prompt for hiking in Norway fjords and mountains keeps the subject strong but shifts the overall look—backdrops start to feel more painterly, and photorealism drops compared with earlier outputs. The user concludes that style drift is possible even when the prompt remains “photo-like,” suggesting prompting details and constraints matter.

Next comes a more object-and-scene workflow using a copied prompt from OpenAI: a monkey holding a banana in a chair. Iterations demonstrate DALL·E 3’s ability to preserve scene structure while swapping elements. When the prompt asks for a smaller banana and a pink chair, the chair changes as requested, but the banana size doesn’t clearly follow. Switching the monkey to a “random Pokemon” works better, and a follow-up to make the Pokemon “more realistic and a bit bigger” yields a result the user is especially happy with. The broader point is that natural language makes repeated edits easier than retyping complex prompts each time.

The transcript then pivots to meme generation and structured illustration prompts. A Michael Scott–style meme (“That’s what she said”) produces mixed results—some text/layout attempts fail to fit or read well—but the user finds at least one version genuinely funny and close to the intended joke format. More impressive are “prompt engineering” style tasks: supplying a blog post and asking for coherent illustrations in a 4:3 format generates multiple usable images quickly (including an oracle-like figure, Nostradamus imagery, and a black swan). A similar approach for a “futuristic gaming console” produces a photo, internal components, and especially a blueprint-style image.

Blueprints with labeled parts are attempted next using a human brain prompt. The model generates an aesthetically pleasing diagram and includes readable labels, but accuracy is imperfect (some parts appear duplicated or mislabeled). Finally, the user tests YouTube thumbnail generation with big brand-like text and high-contrast design instructions. Several outputs look promising, but spelling and letter-shape issues remain a recurring limitation—some thumbnails are clever yet misspell “Dollar Tree” or render characters oddly.

Overall, the impression is positive: DALL·E 3 supports a wider range of practical use cases than Midjourney in a chat-based workflow, with natural-language iteration and image editing as the main advantages. A rate limit also appears (a 13-minute wait after spamming generations), which matters for anyone planning heavy experimentation.

Cornell Notes

DALL·E 3 access is tested through iterative image generation: starting from a photoreal portrait prompt, then reusing “image B” to place the same subject into new scenes like New York and Norway. The results show strong subject continuity, but style can drift—backdrops may shift from photoreal to more painterly looks. Object-and-scene edits work well too, such as changing a monkey to a random Pokemon while keeping the chair and overall composition. Natural-language prompting makes repeated refinement easier than retyping prompts, and structured prompts can generate blog illustrations, futuristic console concepts, and blueprint-style diagrams—though text accuracy (especially spelling) and label precision aren’t guaranteed. Rate limiting also affects how quickly more images can be produced.

How well does DALL·E 3 preserve a subject when iterating from an earlier image?

When the user selects “image B” and asks to place the woman into new settings, the subject remains recognizable even as the environment and outfit change. The New York streets iteration keeps the woman’s identity and produces a convincing backdrop with a bokeh effect. The Norway hiking prompt still keeps the subject strong, but the overall rendering shifts—backdrops start to look more like a painting than a photo, indicating that subject continuity can be better than strict style consistency.

What patterns emerge from the monkey/banana chair experiments?

The chair-and-scene structure largely stays consistent across iterations. The prompt “make the banana smaller and the chair pink” clearly changes the chair color to pink, but the banana size change is not reliably reflected. Switching “the monkey” to a “random Pokemon” works and produces a coherent substitution while keeping the chair. A later prompt to make the Pokemon “more realistic and a bit bigger” improves the result, suggesting that incremental natural-language edits can refine realism and scale.

Why do meme and text-heavy prompts produce mixed outcomes?

Meme attempts rely on fitting text and punchline formatting into a constrained layout. Some generations fail to fit the text cleanly (“way too big to fit here”), and others produce nonsensical or awkward compositions. The user finds one version funnier and closer to the intended joke, but overall the transcript shows that text placement and layout reliability are inconsistent compared with image composition.

How effective are structured prompts for generating illustrations from external content?

When the user provides a blog post and asks for coherent illustrations in a 4:3 format, the model returns multiple relevant images quickly—an oracle-like figure, Nostradamus imagery, a vector-style data graphic, and a black swan. The user then assembles these into a PDF for the blog, implying the workflow is practical for content creation and can be done in minutes.

What limitations show up in blueprint-style labeling tasks?

Blueprint aesthetics can be strong, and labels can appear as readable text (e.g., “brain,” “frontal lobe,” “cerebrum,” “brain stem”). However, accuracy isn’t perfect: some parts appear duplicated (for example, “brain” and “cerebrum” both appear, and at least one label appears twice). The user likes the look but wants better prompt control to improve labeling precision.

What’s the biggest challenge for YouTube thumbnail generation?

Spelling and character rendering. Several thumbnail concepts use big brand-like text and high-contrast design instructions, but outputs sometimes misspell “Dollar Tree” or render letters strangely (e.g., odd “e” shapes or unclear text). The user likes some designs and considers using them, but the transcript makes clear that text correctness remains a weak point.

Review Questions

  1. When iterating from an earlier image (like “image B”), what evidence suggests subject preservation is stronger than style preservation?
  2. Which experiment shows the most reliable scene-structure retention: the monkey-to-Pokemon chair edits or the portrait scene swaps? Why?
  3. What specific failure modes appear in text-heavy tasks (memes vs thumbnails vs blueprint labels), and how do they differ?

Key Points

  1. 1

    Iterating from a selected prior image can preserve the subject while changing environment and outfit, but photoreal style can drift toward painterly backdrops.

  2. 2

    Natural-language edits make repeated refinement easier, especially for swapping objects (monkey to Pokemon) while keeping the overall composition.

  3. 3

    Size and fine-grained attribute changes (like “banana smaller”) are not always followed precisely, even when other changes (like chair color) land clearly.

  4. 4

    Meme-style prompts are sensitive to layout constraints; text that doesn’t fit can break the result or reduce readability.

  5. 5

    Structured prompts tied to external content (blog posts) can generate multiple coherent illustrations quickly and are practical for publishing workflows.

  6. 6

    Blueprint prompts can produce visually strong diagrams with labels, but label accuracy and duplication issues remain a limitation.

  7. 7

    Text-heavy thumbnail generation often struggles with spelling and character rendering, so designs may need manual verification or follow-up prompting.

Highlights

Selecting “image B” and re-prompting can place the same person into new scenes, but the background style may shift away from photorealism.
Changing a monkey into a “random Pokemon” keeps the chair composition largely intact, and a follow-up realism/scale prompt improves the result.
Blog-post illustration prompts can yield multiple usable images in minutes, including oracle, Nostradamus, and black swan concepts.
Blueprints can include readable labels, but part names may duplicate or mislabel despite good aesthetics.
Thumbnail concepts with big text frequently suffer from spelling or letter-shape errors, even when the design looks clever.

Topics

  • DALL·E 3 Access
  • Image Iteration
  • Natural Language Prompting
  • Blueprint Diagrams
  • YouTube Thumbnails

Mentioned

  • Michael Scott