Haiku 4.5 - Small Beats Big
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Claude Haiku 4.5 is priced at $1 per million input tokens and $5 per million output tokens, higher than earlier Haiku releases.
Briefing
Claude Haiku 4.5 is arriving with higher prices, but it’s also delivering a rare mix of speed and task performance that makes it a strong candidate to become the default “grunt work” model for agent building—especially when paired with larger models for planning and reasoning.
Pricing is the immediate sticking point. Claude Haiku 4.5 is listed at $1 per million input tokens and $5 per million output tokens. That’s a step up from earlier Haiku tiers, where Haiku 3.5 was $4 per million output tokens and Haiku 3 was $1.25 per million output tokens. On cost alone, it would be easy to dismiss the upgrade—particularly for teams that previously relied on Haiku for cheap, high-throughput workloads. But the performance comparisons suggest the higher output price may be offset by faster execution and better results on agent-style tasks.
Anthropic positions Haiku 4.5 as a model that can outperform Claude Sonnet 4 on several benchmarks, including Swebench and other agent benchmarks, plus “computer use” style evaluations. The reasoning gap isn’t framed as a full replacement for the top-tier models; instead, Haiku 4.5 is portrayed as a fast, capable workhorse. The practical workflow implied is straightforward: use a frontier model (Sonnet 4.5 and potentially Opus 4.5) for planning and deeper reasoning, then delegate execution-heavy steps—tool use, structured outputs, coding tasks, and other repetitive agent actions—to Haiku 4.5.
The transcript emphasizes that Haiku 4.5 is not just competitive on intelligence metrics; it’s also faster. Benchmarks are described as showing Haiku 4.5 beating models such as GPT-5 and Gemini 2.5 Pro on coding and other tasks, while also running about twice as fast as Sonnet 4. That speed matters because agent systems increasingly depend on rapid iterations: calling tools, producing structured outputs, and completing multi-step tasks where latency compounds.
A cost comparison further supports the “workhorse” framing. Haiku 4.5 is described as costing about one-third of what Sonnet 4.5 will cost, making it attractive as the model that handles the bulk of agent execution even if Sonnet remains the best coding and frontier option. In other words, the upgrade isn’t about replacing the smartest model—it’s about making the delegated layer cheaper per completed task.
To ground the claims, the transcript reports timing tests run through GCP, using measures like time to first token and total response time. Claude Haiku 4.5 is reported to return a response in about 3.6 seconds, with roughly half a second to the first token. Sonnet 4.5 shows longer time-to-first-token and longer full response time, while Sonnet 4 is slower still on first-token timing. Compared with earlier Haiku versions, Haiku 4.5 shows a meaningful speed bump over Haiku 3.5, and it’s described as balancing intelligence improvements with faster or comparable generation times versus Claude 3 Haiku.
Overall, the transcript paints Haiku 4.5 as a practical default for agent “grunt work”: fast tool-using execution, structured classification, and high-volume coding or task steps—while reserving the heaviest reasoning for larger models.
Cornell Notes
Claude Haiku 4.5 costs more than earlier Haiku versions, but it’s positioned as a faster, more capable workhorse for agent workflows. Benchmarks cited in the transcript claim Haiku 4.5 can outperform Claude Sonnet 4 on tasks like Swebench and agent benchmarks, while still lagging behind the strongest models on pure reasoning. The practical strategy described is to use a frontier model (Sonnet 4.5 and possibly Opus 4.5) for planning and reasoning, then delegate execution-heavy steps—tool use, structured outputs, and coding—to Haiku 4.5. Timing tests reported via GCP show very quick “time to first token” and fast total response times, reinforcing its role as a low-latency grunt model.
Why does higher Haiku 4.5 pricing not automatically make it a worse choice for agents?
What benchmark-style claim is used to justify Haiku 4.5 as more than a “cheap model”?
How does the transcript describe the division of labor between Haiku 4.5 and larger models?
What timing results are reported for Haiku 4.5 versus Sonnet models?
How does Haiku 4.5 compare to earlier Haiku versions in speed?
Review Questions
- When would it make sense to pay for a frontier model instead of delegating to Haiku 4.5 in an agent pipeline?
- Which latency metric (time to first token vs total response time) most affects multi-step agent workflows, and why?
- How do the transcript’s benchmark claims and the reported timing tests reinforce (or contradict) each other?
Key Points
- 1
Claude Haiku 4.5 is priced at $1 per million input tokens and $5 per million output tokens, higher than earlier Haiku releases.
- 2
Anthropic’s benchmark comparisons claim Haiku 4.5 can outperform Claude Sonnet 4 on Swebench and several agent benchmarks, including computer use evaluations.
- 3
The recommended agent pattern is to use Sonnet 4.5 (and possibly Opus 4.5) for planning/reasoning, while delegating execution-heavy steps to Haiku 4.5.
- 4
Speed is framed as a major advantage: Haiku 4.5 is described as about twice as fast as Sonnet 4, which matters for tool-using, multi-step agents.
- 5
Reported GCP timing tests show Haiku 4.5 returning in ~3.6 seconds with ~0.5 seconds to the first token, faster than Sonnet 4.5 and Sonnet 4 in the same setup.
- 6
Compared with earlier Haiku models, Haiku 4.5 shows a noticeable speed bump over Haiku 3.5 and a faster overall generation time versus Claude 3 Haiku.
- 7
The transcript positions Haiku 4.5 as a default “grunt work” model for structured outputs, classification, and coding tasks where throughput and latency dominate.