The Best Model For Frontend Design Is...

TL;DR

Opus 4.5 produced the most usable front-end designs only after being paired with the open-source front-end design skill (a markdown behavior guide).

Briefing Cornell Notes

Briefing

Front-end design quality from frontier models depends less on raw “design ability” and more on whether the model is steered with a dedicated design skill—especially for Opus 4.5. In side-by-side tests building a marketing homepage for an “Imagen Studio” app (T4 canvas), Opus 4.5 produced the most usable, non-generic layouts only after being paired with an open-source “front-end design skill” (a markdown behavior guide). Without that skill, Opus outputs leaned into familiar slop patterns—purple/blue gradients, repetitive layouts, and broken or unreadable typography—making it easy to see why many people rank it low for design.

The experiment ran multiple “treatments” across models: a default prompt that asked for organic built-in design without special skills, and a second prompt that explicitly instructed the model to use front-end design skill behavior to make outputs exceptional. For Gemini 3 Pro, the story was mixed: by default it often generated attractive designs with a distinct aesthetic, but its tool/harness experience was unreliable and sometimes crashed or hallucinated errors. Adding the design skill improved Gemini’s variety and usability in places, yet it still showed recurring issues—readability problems, layout mistakes, and occasional “cringe” visual choices—suggesting the skill helps but doesn’t fully overcome Gemini’s execution quirks.

GPT 5.2 landed in the middle. It could generate design directions that looked more editorial or text-forward, but the structure across iterations stayed similar, and some versions suffered from practical UI problems like clipping and low legibility. In the transcript’s comparisons, GPT 5.2 never matched the “hit rate” Opus achieved once the design skill was applied.

The most striking results came from Opus 4.5. In the “worst case” (default prompt), the first set of designs included unreadable text, harsh gradients, and repeated structural motifs. After enabling the front-end design skill, Opus outputs jumped to a different tier: cleaner UX, more intentional layout choices, better handling of typography and spacing, and even UI elements like a design-switcher control that weren’t present before. The gap between “no skill” and “with skill” was described as “insane,” with multiple iterations producing starting points that were not just pretty but also workable.

A final stress test focused on iteration. Gemini’s designs that looked good initially often failed to improve when asked to iterate based on what was liked; follow-ups drifted back into template-like pattern matching and ignored the user’s preferences. Opus, by contrast, showed higher “malleability”: when given two preferred Gemini designs as references, Opus produced new variations that retained the intended direction more consistently. The overall takeaway from the poll and follow-up iterations: Gemini can be strong at first-pass aesthetics, but Opus paired with the front-end design skill is more reliable for steering, refining, and producing designs that can evolve into something genuinely usable.

The practical “how-to” portion frames the skill as simple markdown that can be copied into various coding/agent tools. The transcript points to a centralized skills directory (including a “front-end design skill” that stays near the top of a leaderboard) and shows how to install it globally or per-project, then select which tools it should apply to. The core claim is blunt: a markdown skill file can unlock design behavior that appears hidden in Opus otherwise, turning a model many people dismiss for design into a dependable front-end design workhorse.

Cornell Notes

The transcript argues that front-end design quality from frontier models improves dramatically when a model is paired with a dedicated “front-end design skill” (an open-source markdown behavior guide). In tests building a marketing homepage for an “Imagen Studio” app, Opus 4.5 performed poorly on design without the skill—showing issues like purple/blue gradients, repetitive layouts, and broken typography. With the skill enabled, Opus produced markedly more usable, intentional designs and better UX details, plus higher success when iterating based on what was liked. Gemini 3 Pro often looked good by default but suffered from harness/tool reliability problems and weaker preference-following during iteration. GPT 5.2 produced workable designs but tended to repeat structure and sometimes clipped or reduced readability.

Why did Opus 4.5 look “bad at design” in the default setup, and what changed after adding the front-end design skill?

In the default prompt, Opus outputs included common “AI slop” patterns: harsh purple/blue gradients, similar layout shapes across variants, unreadable or poorly contrasted text (e.g., “create” text barely visible), and even UI problems like noise textures that didn’t serve the design. After enabling the front-end design skill, Opus produced cleaner, more intentional pages—better typography/spacing, more coherent component choices, and even UX elements like a bottom control to switch between the five generated designs. The transcript emphasizes the jump as “insane,” with multiple iterations becoming usable starting points rather than just pretty failures.

What exactly is the front-end design skill, and how does it steer model behavior?

The skill is described as a markdown file that acts like reusable instruction context—behavior the model should follow when generating front-end interfaces. Its guidance stresses “intentionality” over intensity and pushes the model to choose a clear conceptual direction, then execute with precision. It also includes explicit “don’t do this” rules: avoid generic AI aesthetics like overused font families (e.g., Roboto/Inter/system fonts), cliché purple gradients on white backgrounds, predictable layouts, and cookie-cutter designs. It encourages varying themes, fonts, and aesthetics so designs don’t converge on the same template.

How did Gemini 3 Pro compare with Opus when the skill was on versus off?

By default, Gemini 3 Pro often produced attractive designs with a distinct look, but the harness/tool experience was unreliable—sometimes stuck, sometimes crashed, and sometimes hallucinated errors (e.g., a clone-related runtime issue). With the front-end design skill, Gemini’s outputs became more varied and sometimes more usable, yet recurring problems remained: readability issues, layout/detail mistakes, and occasional “cringe” visual choices (including extreme blur effects). The transcript’s conclusion is that Gemini can be strong for first-pass aesthetics, but it needs additional tuning and still struggles with consistent execution.

What pattern emerged from GPT 5.2’s iterations?

GPT 5.2 produced workable designs, but the transcript notes two recurring weaknesses: (1) iterations often moved toward an editorial/news-like direction with heavy text, and (2) structure stayed similar across versions, suggesting template-like behavior. Some versions also had practical UI issues such as clipping or low legibility. Even when GPT 5.2 was competent at generating a direction, it didn’t match Opus’s reliability for producing distinct, usable starting points.

Why did iteration based on “what people liked” work better with Opus than with Gemini?

When asked to iterate, Gemini’s follow-ups often failed to preserve the liked direction. The transcript describes Gemini as drifting back into template application—good first outputs, then weaker preference-following when generating new variations. In contrast, Opus showed higher “malleability”: after identifying two Gemini favorites, Opus generated new designs that clearly borrowed the intended direction and improved the chance of producing workable starting points. The poll and follow-up iterations both supported this preference for Opus+skill.

How can the skill be installed and applied across tools?

The transcript frames the skill as easy to manage because it’s “just markdown.” It points to a centralized “agent skills” directory (with a front-end design skill near the top of a leaderboard) and instructs copying the skill into a terminal workflow. Users can choose whether to apply it globally or per-project, and then select which tools it should affect (the transcript mentions adding tools like Cursor). The practical message: once installed, the skill steers the model’s front-end generation behavior without requiring code changes.

Review Questions

In the transcript’s comparisons, what specific visual or UI failures appeared in Opus 4.5 outputs when the front-end design skill was not used?
What evidence suggests Gemini’s strength is more about first-pass aesthetics than about preference-following during iteration?
How does the front-end design skill’s “avoid generic aesthetics” guidance relate to the differences seen between the “with skill” and “without skill” outputs?

Key Points

1
Opus 4.5 produced the most usable front-end designs only after being paired with the open-source front-end design skill (a markdown behavior guide).
2
Default prompts without the skill led Opus toward recognizable “AI slop” patterns like purple/blue gradients, repetitive layouts, and broken or unreadable typography.
3
Gemini 3 Pro often generated attractive designs by default, but harness/tool reliability issues (stalls, crashes, hallucinated errors) limited practical use.
4
GPT 5.2 delivered workable designs but tended to repeat structure and sometimes introduced legibility or clipping problems.
5
The front-end design skill steers models away from generic aesthetics by enforcing intentional direction, varied themes/fonts, and non-cookie-cutter layouts.
6
Iteration based on user preferences worked better with Opus+skill than with Gemini, which often drifted back to template-like pattern matching.
7
The skill can be installed by copying markdown into agent/tool skill directories and applying it globally or per-project.

Highlights

The biggest swing came from Opus 4.5: “without skill” designs were frequently marred by gradients, unreadable text, and repetitive structure; “with skill” outputs became genuinely usable starting points.

Gemini 3 Pro looked good in first passes, but its harness/tool behavior was described as broken enough to undermine reliability, and iteration often ignored what was liked.

The front-end design skill’s rules explicitly target common AI design tropes—fonts, gradients, predictable layouts—pushing models toward intentional, context-specific choices.

Iteration tests suggested Opus+skill better preserves user intent, while Gemini’s follow-ups often revert to baked-in templates.

Topics

Frontend Design Skills
Opus 4.5
Gemini 3 Pro
GPT 5.2
Agent Skills Setup