Get AI summaries of any video or article — Sign up free
Visual Thinking with AI Art thumbnail

Visual Thinking with AI Art

5 min read

Based on Zsolt's Visual Personal Knowledge Management's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Use AI art as a thinking catalyst: generate multiple variants, then connect the images to ideas in mind maps or sketches rather than aiming for perfect final art.

Briefing

AI art is being positioned as a practical tool for visual thinking and personal knowledge management—especially when paired with careful prompting and downstream editing—while still coming with predictable friction around accuracy, consistency, and abstract concepts. The core workflow starts with text-to-image generation: feed a detailed prompt, get multiple image variants, then use the results to stimulate illustration, lateral thinking, and idea mapping rather than treating the images as final artwork.

The transcript compares four image-generation options—two paid and two free—using the same sample prompt: a photorealistic scene of an anthropomorphic robot at a desk in a spaceship living quarter, holding a stylus and staring at an empty screen. OpenAI’s DALL·E offers a trial with 50 credits, producing four variants per prompt (200 images total), followed by 15 free credits monthly and the option to buy additional credits. Midjourney is accessed via a Discord bot; the demo uses “—-dash dash V4 instructs” to select the v4 stable diffusion algorithm and “—-dash dash AR3:2” for a 3:2 aspect ratio suited to thumbnails. A trial provides 25 minutes of GPU time (about 25 images), with four variants per prompt; afterward, users can start a new trial with a different Discord account or subscribe (with a stated basic cost of $8/month when paid annually or $10/month monthly, including 200 GPU minutes).

For free options, Stable Diffusion Online is presented as less consistently impressive but still useful for idea generation, with an emphasis on browsing its prompt database for inspiration and ready-made examples. A local option, Stable Diffusion B, is described as Mac-only at the time, with Windows beta reportedly in progress; Dolly on Google Colab was planned but didn’t work reliably after an initial success, so the transcript instead points viewers to prompt-writing guidance and clip templates.

Four use cases show how AI images can support thinking. First, illustration: prompts can specify style (e.g., Salvador Dalí), medium (pencil drawing), mood (“the will to endure”), and background (white background). When a clean background is needed for integration into mind maps, the workflow recommends generating the image with a white background and then removing it using LunaPic to produce a transparent PNG for insertion into XMind. Second, lateral thinking: the transcript links the technique to Edward de Bono’s concept of breaking linear thought patterns, then suggests generating visual metaphors. One method converts a passage into a short poem (haiku, tanka, sestet, couplet) via GPT-3, then uses that poem as the art prompt; another asks GPT-3 for visual metaphors from a problem statement and feeds the combined metaphor into the image generator, followed by reflection and connection-building in a mind map.

Limitations are treated as part of the method rather than a deal-breaker. Image quality varies, and even “perfect” outputs can contain small errors like malformed hands or misplaced eyes. Abstract concepts can be hard to render accurately, though ambiguity may be useful for sparking new ideas. Character consistency is weak—creating a comic strip is difficult because the same character often changes across scenes, even for well-known figures like Mickey Mouse. The transcript also notes difficulty with upscaling low-quality old photos and suggests using GPT-3 to translate abstract ideas into more concrete descriptions when generation fails. Overall, AI art is framed as a thinking catalyst: generate, edit, connect, and iterate—while expecting imperfections.

Cornell Notes

Text-to-image AI is presented as a support system for visual thinking and personal knowledge management. The workflow pairs prompting with downstream use: generate multiple variants, then edit and integrate images into mind maps or illustrations. Four generation tools are compared: DALL·E and Midjourney (paid), Stable Diffusion Online and Stable Diffusion B (free/local), with Dolly on Google Colab mentioned as unreliable. Illustration prompts can specify style, medium, and background; LunaPic can remove a white background to create transparent PNGs for mind maps. For lateral thinking, GPT-3 can turn ideas into poems or visual metaphors that then become prompts, producing images that help users make new connections—while accepting limitations like inconsistent characters and occasional rendering errors.

How do the prompt and credit systems differ across DALL·E and Midjourney, and why does that matter for iterative thinking?

DALL·E’s trial grants 50 credits, and each prompt yields four image variants—so the trial produces 200 images total. Afterward, it provides 15 free credits monthly plus the option to buy more credits. Midjourney is accessed through a Discord bot and uses GPU time instead of credits: a trial provides 25 minutes (about 25 images), and each prompt also generates four variants. For visual thinking, this changes how quickly someone can iterate: DALL·E’s credit-based trial can be “spent” on many prompt variations, while Midjourney’s GPU minutes can feel more constrained, encouraging tighter prompt refinement.

What’s the practical illustration workflow for turning an AI image into something usable in a mind map?

The transcript recommends prompting for a white background so the subject is easy to isolate. After generation, LunaPic is used to replace the white background with transparency: upload the image, choose Edit, select Transparent Color (or transparent area), and pick the color to remove. Users may need to experiment with transparency settings, then save as a transparent PNG and insert it into XMind for mind-map integration.

How does the transcript connect lateral thinking to AI art in a concrete step-by-step loop?

Lateral thinking (linked to Edward de Bono) is treated as breaking linear thought patterns and making connections between unrelated ideas. One loop starts by writing down a problem, asking GPT-3 for visual metaphors, combining suggestions into a single metaphor, and feeding that into the image generator. After images appear, the user reflects and builds connections in a mind map. Another loop converts a passage into a short poem (haiku/tanka/sestet/couplet) using GPT-3, then uses the poem as the art prompt; the resulting images become prompts for reflection on the original idea.

Why is character consistency a major obstacle for comic-strip creation with AI art?

The transcript reports that the AI struggles to keep the same character across multiple scenes. Even when the character is well-defined as a cartoon worm, the character can vary; the issue persists with recognizable characters like Mickey Mouse, where the generated versions don’t match closely enough to maintain visual continuity. That makes multi-panel storytelling visually inconsistent, even if the images are individually appealing.

What kinds of limitations are described as most likely to affect results, and how can they be handled?

Four recurring issues are highlighted: (1) variable image quality, including small errors like malformed hands or misplaced eyes; (2) difficulty rendering abstract concepts, which can be mitigated by translating abstractions into more concrete descriptions via GPT-3; (3) inconsistent characters across scenes, which complicates comics; and (4) limited success when upscaling low-quality old photos. The transcript suggests embracing ambiguity when the goal is ideation, but using more concrete prompting when accuracy matters.

Review Questions

  1. When would it be better to prompt for a white background and later remove it, versus prompting for a complex background directly?
  2. What two different GPT-3-to-image strategies are used to generate lateral-thinking prompts, and how do they differ?
  3. Which limitation would most directly undermine a project that requires repeated characters across panels, and what workaround is implied by the transcript?

Key Points

  1. 1

    Use AI art as a thinking catalyst: generate multiple variants, then connect the images to ideas in mind maps or sketches rather than aiming for perfect final art.

  2. 2

    DALL·E trial credits and Midjourney GPU minutes both produce four variants per prompt, but the resource model changes how many iterations are practical.

  3. 3

    For illustration workflows, prompt for a white background and use LunaPic to create transparent PNGs for easy placement in XMind.

  4. 4

    Lateral thinking can be operationalized by converting problems into visual metaphors (via GPT-3) and then reflecting on the resulting images to build new connections.

  5. 5

    Abstract concepts often require translation into more concrete descriptions before image generation works reliably.

  6. 6

    Expect small rendering errors (hands, eyes) even in high-quality outputs, and plan for iteration or acceptance of imperfections.

  7. 7

    Character consistency is weak across scenes, making comics and long-form visual narratives difficult without additional constraints or post-processing.

Highlights

DALL·E’s trial yields 200 images because each of 50 credits can generate four variants per prompt—useful for rapid ideation.
Midjourney’s Discord-bot workflow uses GPU minutes and also returns four variants per prompt, shaping how tightly prompts must be refined.
A reliable integration trick: generate with a white background, remove it with LunaPic, and import a transparent PNG into XMind.
Lateral thinking is turned into a repeatable pipeline by using GPT-3 to create poems or visual metaphors, then using the outputs as image prompts.
AI struggles with character continuity across scenes, even for familiar characters like Mickey Mouse, which complicates comic-strip creation.

Topics