JSON: How I Build Perfect Images in NanoBanana Pro

TL;DR

NanoBanana Pro is positioned as a correctness-first renderer, and JSON schemas supply the structured parameters that make outputs consistent.

Briefing Cornell Notes

Briefing

NanoBanana Pro’s edge comes from pairing its “correctness-first” renderer with JSON prompting—turning image generation from a vibes-driven process into something closer to a governed, testable design pipeline. Instead of letting a model freestyle, JSON supplies machine-readable parameters that lock down high-stakes details like camera behavior, lighting, UI layout, and component rules. That precision matters most when the output must be consistent across runs, reviewable by humans, and compatible with professional workflows where changes need to be tracked and reproduced.

The core workflow starts with plain-English intent. A JSON translator converts that intent into a structured JSON schema that NanoBanana Pro can interpret. The pitch isn’t that JSON is universally “the only correct way” to prompt models—models can follow many prompting styles—but that JSON is uniquely useful when the creator is confident about what must be specified. In marketing and product contexts, small deviations can break brand consistency or usability. JSON becomes a way to encode those constraints explicitly: a beverage can’s exact look, lighting requirements, UI color targets, or even accessibility rules.

A major benefit is compositional control. With JSON schemas, creators can pivot a camera around the same scene, swap themes, and change layouts while keeping stable handles for key elements. The transcript describes separating subject versus environment, assigning component IDs in UI, and then regenerating while touching only one field—effectively enabling scoped mutations rather than re-rolling the entire image. That “stable handle” concept is presented as the reason NanoBanana Pro can support repeatable iteration.

NanoBanana Pro is also framed as multi-grammar: it can render photo-like outputs, diagrams, and UI designs, even though those domains use different visual vocabularies. JSON schemas help by pinning down the underlying entities and their rigid relationships for each domain. The shared pattern across domains is the same: structured blobs with named fields that the system must honor. In practice, that means the same disciplined approach can drive marketing images, diagrams, and interfaces—without relying on the model’s open-ended creativity.

The transcript argues that this structure is what makes NanoBanana Pro suitable for serious product stacks, where reproducibility, diffing, and version control are non-negotiable. JSON schemas can be versioned (e.g., comparing V3 vs V4), enabling teams to see exactly what changed between runs. Constraints can be enforced in the schema, such as minimum UI tap target sizes (44 pixels) and accessibility requirements.

A practical example demonstrates the flow: a short instruction (“please respond with a filled out JSON template” for a creative alien UI) yields a fully populated JSON schema. After review, the same JSON is reused with an added instruction to adjust presentation (tilt angle) while keeping the structure intact, producing a buildable, reproducible wireframe. The takeaway is less about alien interfaces and more about treating JSON as “pseudo code” that humans can learn to read—so creators can retain their preferred way of describing work (paragraphs or bullets), convert it into structured inputs, and iterate with deterministic control rather than guesswork.

Cornell Notes

NanoBanana Pro’s workflow improves reliability by combining a correctness-focused renderer with JSON schemas. JSON isn’t presented as universally required, but as a strong fit when creators know what must be specified—especially for marketing images, UI layouts, and diagrams. JSON schemas provide stable handles for key elements, enabling scoped regeneration (change one field without redoing the whole scene) and supporting compositional control like camera pivots and layout swaps. The approach also supports professional engineering needs: version control, diffing between prompt/schema revisions, and enforceable constraints such as UI tap target minimums and accessibility rules. A demonstrated example shows short text leading to a fully filled JSON template that can be reviewed and regenerated into a reproducible wireframe.

Why does JSON prompting matter more for some tasks than others?

JSON is most valuable when the creator is confident about what must be fixed. In high-stakes scenarios—like a marketing image that must match a specific beverage can look, lighting, or a UI that must hit exact colors and layout—JSON provides machine-readable parameters that reduce drift. When creativity is the goal and the model should explore, JSON can be counterproductive because it constrains the model’s freedom.

How does JSON enable “scoped mutation” instead of regenerating everything?

JSON schemas give stable handles to important elements (e.g., separating a subject from an environment, or using component IDs in a UI). Once those named fields exist, regeneration can target only one field—so the system can update a single aspect while keeping the rest consistent. The transcript frames this as a key reason NanoBanana Pro shines: it avoids turning the entire scene over to the model again.

What does “compositional control” look like in this workflow?

Compositional control means using structured parameters to vary composition while preserving the same underlying scene logic. The transcript describes pivoting a camera around the same scene, swapping themes, and changing layouts. JSON makes those variations explicit through human-readable properties, so the creator can systematically control what changes and what stays stable.

How does the approach extend across photos, diagrams, and UI despite different visual grammars?

Even though photo, diagram, and UI domains use different surface-level vocabulary, each domain has core entities and rigid relationships. JSON schemas pin down those entities and relationships for each grammar. Because the system honors named fields, the same structured-input strategy can produce a photo-like output, a diagram, or a UI wireframe while keeping the underlying structure consistent.

What makes the workflow more suitable for professional product stacks?

Professional stacks require reproducibility, diffing, and governance. JSON schemas can be version-controlled, enabling comparisons like “what changed between V3 and V4.” The schema can also encode rules—such as not reducing UI tap targets below 44 pixels—and include accessibility constraints. That turns image generation into something teams can reason about and test rather than a one-off creative output.

What was demonstrated in the alien UI example, and why is it relevant?

A short instruction requested a filled-out JSON template for a creative alien UI. The model returned a complete JSON schema, including faithful UI content like “initiate first contact.” After review, the same JSON structure was reused with added instruction to adjust the wireframe’s presentation (tilting forward) while keeping the output reproducible. The point wasn’t the alien theme; it was showing how structured JSON can produce buildable, repeatable wireframes.

Review Questions

In what situations would JSON constraints likely reduce quality or creativity, and why?
How do stable handles (like component IDs) change the way iteration works compared with free-form prompting?
What kinds of governance needs (diffing, version control, accessibility rules) does JSON schema support in this workflow?

Key Points

1
NanoBanana Pro is positioned as a correctness-first renderer, and JSON schemas supply the structured parameters that make outputs consistent.
2
JSON prompting is most effective when the creator already knows what must be specified (e.g., brand-accurate marketing visuals or exact UI layout and colors).
3
JSON schemas enable compositional control by making camera, layout, and element variations explicit through named fields.
4
Stable handles in the schema allow scoped regeneration—changing one field without re-rolling the entire scene.
5
Version control and diffing become practical when prompts and schemas are represented as structured, comparable JSON revisions.
6
Accessibility and usability constraints (like minimum 44-pixel tap targets) can be encoded directly into the JSON schema.
7
A translator workflow lets humans write plain-English instructions while still producing strict JSON that can be reviewed and reused.

Highlights

JSON is framed as a tool for correctness and governance, not a universal rule for all prompting.

Stable element handles in JSON let creators regenerate with targeted changes instead of restarting from scratch.

The workflow supports professional needs like versioning, diffing, and enforceable UI constraints (including 44-pixel tap targets).

A short text prompt can generate a fully populated JSON template that then yields reproducible wireframes after review and minor adjustments.

Topics

JSON Prompting
NanoBanana Pro
Compositional Control
Reproducible Wireframes
Accessibility Constraints

Mentioned

Nate B Jones