Confused by o4 vs. o3? My Trick to Remember Each of the 16 Major AI Models

TL;DR

Use semantic meaning—not arbitrary name decoding—to help people remember what each AI model is best at.

Briefing Cornell Notes

Briefing

Naming differences between AI models—like why “o4” might be considered different from “o3”—often fail because people try to map meaning onto arbitrary text strings. The core fix offered here is to use semantic meaning instead: attach memorable, human-understandable concepts to each model so learners can recall what each one is best at.

To make that idea practical, the creator turns 16 major models from the Hugging Face leaderboards into a printable “card deck.” Each card gets a one-word label meant to capture the model’s strongest vibe or typical use case, plus classroom-style exercises and “model card” formatting designed for learning. The goal isn’t to claim the labels are perfect or exhaustive; it’s to give people a mental handle that helps them remember which model to reach for when they need a particular kind of output.

The deck’s examples start with o3, labeled “Artificer.” The word is chosen to signal technical competence, hard-problem solving, and creation—though it can feel a bit cold. Another card, “Voyager” for Yi 5 200 billion parameters, is framed around bridging cultures and languages, reflecting Yi’s specialization in English–Chinese fluency. Claude for Opus is labeled “Polymath,” tying the name to strong reading/critique and improving writing and code problem solving.

Grok is labeled “Maverick,” with the explanation that it tends toward unconventional ideas and inventiveness, including references to its sourcing from an X (Twitter) stream; the card also flags misalignment concerns. Perplexity—treated as a model despite common confusion about it being “just search”—is labeled around “Sonar,” emphasizing that its web-search-oriented system is what drives strong leaderboard performance. Llama 345B is included, and Mixtral 822 billion is given the label “Collective,” chosen to reflect its mixture-of-experts voting behavior for tokens; the card also highlights privacy as a notable trait.

Every card includes caveats, because no model is presented as flawless. The deck’s design also blends “Magic: The Gathering”-style visuals with AI “nerdy” diagrams to make the underlying mechanism easier to remember.

Finally, the creator shares a personal “stack” for day-to-day use. o3 is the daily driver, handling roughly 60–70% of queries. ChatGPT 40 is used about 10–15% of the time for simpler tasks like rewording, reformatting, and markdown—plus more companionable conversation. Claude for Opus is used similarly for coding structure and problem-solving, though it’s described as weaker for long-context chats. Gemini 2.5 Pro is used as a verifier and fact-checker when trust is low, and Deep Research is reserved for slower, high-quality work that benefits from time away.

The takeaway is less about any single label and more about learning: people remember systems better when meaning is semantic, not arbitrary—so model makers’ naming conventions and documentation should help users build that mental map.

Cornell Notes

The central idea is that model names are hard to learn because they don’t carry semantic meaning. To fix that, a printable “card deck” approach assigns each major AI model a one-word label tied to what it’s typically best at, plus classroom-style exercises and caveats. Examples include o3 as “Artificer” (technical problem solving and creation), Yi 5 200 billion as “Voyager” (bridging English and Chinese), and Claude for Opus as “Polymath” (critique, writing support, and code problem solving). Grok becomes “Maverick” with a note about misalignment risks, while Perplexity is framed through “Sonar” for web-search performance. The practical payoff is a clearer mental model for choosing the right system—reinforced by a personal usage stack and reminders that no model is perfect.

Why does semantic meaning beat memorizing arbitrary model names?

The approach argues that humans don’t learn well by attaching random meaning to text strings. Instead, people remember better when the label carries semantic cues—like a story or concept—so the brain can retrieve the right association later. That’s why each card uses a one-word “vibe” tied to typical strengths, rather than expecting learners to decode naming conventions.

How do the card labels map to concrete strengths (examples)?

o3 is labeled “Artificer” to reflect technical competence, hard problem solving, and creation. Yi 5 200 billion is labeled “Voyager” because Yi is specialized in English–Chinese fluency, so the word signals bridging cultures. Claude for Opus is labeled “Polymath,” reflecting strong reading/critique and strong performance in prompting for writing and code problem solving.

What’s the rationale for including caveats on every model card?

The deck treats model imperfections as part of learning. Each card calls out issues or risks—Grok’s card, for instance, flags recent misalignment concerns—so users don’t assume a label guarantees reliability. The goal is balanced “balls and strikes,” not blind promotion.

How is Perplexity handled given confusion about what it “is”?

Perplexity is described as often mistaken for a simple LLM-powered search engine, but the card frames its strength through Sonar, a web-search-oriented system that counts toward leaderboard performance. The label is meant to steer attention to the component that drives results.

What does “Collective” communicate about Mixtral 822 billion?

“Collective” is chosen to reflect mixture-of-experts behavior: multiple expert components vote on tokens. The card’s visuals (e.g., a diagram of multiple people around a model concept) are meant to make that voting mechanism memorable, and the card also notes privacy as a notable trait.

How does the personal “stack” translate into model choice?

The usage split is practical: o3 handles about 60–70% of queries as the daily driver. ChatGPT 40 is used 10–15% for simple transformations like rewording, reformatting, and markdown, and for warmer conversation. Claude for Opus is used about 10–15% for structuring coding problems and back-and-forth problem solving, with a noted weakness in long-context chats. Gemini 2.5 Pro is used as a second opinion for fact-checking when trust is low, and Deep Research is used for slower, high-quality outputs that benefit from time.

Review Questions

Which learning principle motivates the one-word model labels, and how does it change how a user selects a model?
Pick three models from the deck and match each to its one-word label and the specific strength described for it.
What role does Gemini 2.5 Pro play in the personal stack, and why is it used instead of relying on a single model?

Key Points

1
Use semantic meaning—not arbitrary name decoding—to help people remember what each AI model is best at.
2
A printable “card deck” format can turn model selection into a visual, classroom-friendly learning tool.
3
o3 is framed as “Artificer,” emphasizing technical competence, hard problem solving, and creation.
4
Yi 5 200 billion is framed as “Voyager,” reflecting its strength in English–Chinese fluency and cross-cultural bridging.
5
Claude for Opus is framed as “Polymath,” tied to critique, writing support, and code problem solving.
6
Grok is framed as “Maverick,” with unconventional ideation paired with misalignment caveats.
7
A practical stack uses o3 for most tasks, ChatGPT 40 for quick rewrites and warmer chat, Claude for Opus for coding structure, Gemini 2.5 Pro for verification, and Deep Research for high-quality work that needs time.

Highlights

The core learning fix is semantic labeling: attach human-rememberable meaning to model capabilities instead of trying to decode naming conventions.

Turning 16 major models into printable cards makes model choice teachable—each card includes a one-word strength label plus caveats.

Gemini 2.5 Pro is positioned as a grounded second opinion for fact-checking when trust in other models is low.

Mixtral 822 billion’s “Collective” label is explicitly tied to mixture-of-experts voting on tokens, not just a vague “ensemble” idea.

Topics

AI Model Naming
Semantic Memory
Printable Model Cards
Mixture of Experts
Model Selection Stack