A realistic comparison of Opus and Codex
Based on Theo - t3․gg's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Codex 5.3 is the default choice for correctness-heavy work like migrations, PR reviews, and security audits, because it tends to avoid missing key details and handles blockers more directly.
Briefing
Codex 5.3 comes out ahead for day-to-day software work—especially when tasks involve real-world complexity like migrations, PR reviews, and “make it correct” engineering. The tradeoff is speed and vibe: Opus 4.6 often gets to a working UI or a usable first draft faster, but it more frequently leaves behind sloppy details, misses edge cases, or introduces security and correctness issues that later require cleanup.
Pricing and access set the stage for why this comparison is messy. Opus 4.6 has published API rates ($25/M tokens out, $5/M tokens in, with fast mode costing 2–3x more and 6x higher expense). Codex 5.3 isn’t broadly available over the API yet, limiting direct benchmarking; the best guess is that Codex 5.3 pricing will resemble earlier Codex tiers (e.g., Codex 5.2’s $1.75/M in and $14/M out). Even so, the creator’s practical takeaway is that Codex tends to be cheaper per token, while Opus can look better per “run” when Codex burns more tokens reasoning through correctness. In subscription terms, Codex also appears more generous in usage quotas, with the creator reporting heavy Codex usage while staying far from limits.
On “intelligence” and hard problem solving, Codex repeatedly wins—sometimes by a lot. In a difficult migration of an old codebase (Round/ping.gg, built on an early T3 stack), Codex 5.3 was the first model to succeed. Its approach: bump what it needs, patch temporarily to unblock, then remove patches once the dependency chain is stabilized—an iterative strategy that avoids the cascade failure pattern where a model upgrades everything and breaks the rest.
Opus’s strengths show up when the goal is to unblock quickly or produce front-end work that looks good. The creator describes Opus as a faster “get it working” partner, particularly for UI design and for “computer-adjacent” tasks like editing dotfiles, SSH-ing to machines, and configuring systems. Opus also sometimes succeeds where Codex fails—such as getting a migration to complete without triggering the kind of deep “fix everything” loop that can trap thorough models.
The biggest fault line is diligence versus shortcuts. Codex is portrayed as “measure twice, cut once”: it tends to notice missing details, handle blockers directly, and avoid leaving insecure or inconsistent code behind. Opus is portrayed as “measure less, ship sooner”: it may ignore blockers by trimming scope, and it can produce working code that later turns out to be wrong or insecure. The creator cites examples involving environment variable handling, database schema/type safety gaps, and even a security-relevant bug where Opus made user association nullable in image generation.
Finally, the comparison isn’t just about model IQ—it’s about platform behavior and trust. The creator prefers Codex for codebase safety and security work, but prefers Opus for the day-to-day experience: faster iteration, more pleasant interaction, and better front-end polish. They also criticize harness quirks (especially Cloud Code) and highlight that Codex’s thoroughness can sometimes become counterproductive in long-running tasks.
Bottom line: if forced to pick one model for serious engineering, Codex is the safer default. If the priority is speed, UI aesthetics, and a more enjoyable workflow, Opus remains compelling—often best used as a complementary tool rather than a replacement.
Cornell Notes
Codex 5.3 is favored for solving difficult, real engineering problems—especially migrations, code reviews, and “make it correct” tasks—because it handles blockers more directly and tends to avoid missing important details. Opus 4.6 is often faster at getting something working and is particularly strong for front-end design and system-adjacent tasks like configuring machines and editing local files. The creator’s key pattern is diligence vs. shortcuts: Codex “measures twice, cuts once,” while Opus may ship sooner but leave cleanup work (including occasional security or correctness issues). Pricing is complicated by Codex 5.3’s limited API availability, but subscription usage and token behavior suggest Codex can be more cost-effective per token and more generous in quotas. The practical recommendation is to default to Codex for codebase-critical work and use Opus when speed and UI polish matter most.
Why does Codex 5.3 repeatedly outperform Opus 4.6 on “hard problems” like migrations?
What specific kinds of mistakes does Opus 4.6 make that create extra cleanup later?
How do the models differ in front-end work and UI iteration?
What does the creator mean by “diligence can hurt” with Codex?
How does the creator decide which model to use in practice?
Why does the comparison include platform and harness issues, not just model quality?
Review Questions
- In the Round/ping.gg migration example, what specific technique did Codex use to avoid a dependency upgrade cascade?
- Give one example of an Opus failure mode that required follow-up cleanup, and explain why it mattered (correctness vs. security vs. type safety).
- When does Codex’s thoroughness become a liability, and what symptom did the creator observe in the long-running AISDK v6 migration?
Key Points
- 1
Codex 5.3 is the default choice for correctness-heavy work like migrations, PR reviews, and security audits, because it tends to avoid missing key details and handles blockers more directly.
- 2
Opus 4.6 often reaches a usable result faster and is especially strong for front-end design and UI polish, but it more frequently leaves behind issues that require cleanup.
- 3
Pricing comparisons are complicated by Codex 5.3’s limited API availability; practical subscription usage and token behavior become the main evidence for cost and quota differences.
- 4
Codex’s diligence can sometimes turn into over-fixing or runaway thoroughness in long-running tasks, while Opus may “ship sooner” by trimming scope or skipping certain checks.
- 5
A productive workflow is often complementary: use Opus to unblock when Codex gets stuck, then use Codex to finish correctly and comprehensively.
- 6
Harness/tooling reliability (e.g., Cloud Code vs Codex CLI/desktop) materially affects trust and day-to-day productivity, not just model capability.
- 7
If forced to pick one model for serious engineering, Codex is recommended; if the priority is speed and a more pleasant interaction for iterative work, Opus is a strong alternative.