Get AI summaries of any video or article — Sign up free
Haiku 4.5 - Small Beats Big thumbnail

Haiku 4.5 - Small Beats Big

Sam Witteveen·
5 min read

Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Claude Haiku 4.5 is priced at $1 per million input tokens and $5 per million output tokens, higher than earlier Haiku releases.

Briefing

Claude Haiku 4.5 is arriving with higher prices, but it’s also delivering a rare mix of speed and task performance that makes it a strong candidate to become the default “grunt work” model for agent building—especially when paired with larger models for planning and reasoning.

Pricing is the immediate sticking point. Claude Haiku 4.5 is listed at $1 per million input tokens and $5 per million output tokens. That’s a step up from earlier Haiku tiers, where Haiku 3.5 was $4 per million output tokens and Haiku 3 was $1.25 per million output tokens. On cost alone, it would be easy to dismiss the upgrade—particularly for teams that previously relied on Haiku for cheap, high-throughput workloads. But the performance comparisons suggest the higher output price may be offset by faster execution and better results on agent-style tasks.

Anthropic positions Haiku 4.5 as a model that can outperform Claude Sonnet 4 on several benchmarks, including Swebench and other agent benchmarks, plus “computer use” style evaluations. The reasoning gap isn’t framed as a full replacement for the top-tier models; instead, Haiku 4.5 is portrayed as a fast, capable workhorse. The practical workflow implied is straightforward: use a frontier model (Sonnet 4.5 and potentially Opus 4.5) for planning and deeper reasoning, then delegate execution-heavy steps—tool use, structured outputs, coding tasks, and other repetitive agent actions—to Haiku 4.5.

The transcript emphasizes that Haiku 4.5 is not just competitive on intelligence metrics; it’s also faster. Benchmarks are described as showing Haiku 4.5 beating models such as GPT-5 and Gemini 2.5 Pro on coding and other tasks, while also running about twice as fast as Sonnet 4. That speed matters because agent systems increasingly depend on rapid iterations: calling tools, producing structured outputs, and completing multi-step tasks where latency compounds.

A cost comparison further supports the “workhorse” framing. Haiku 4.5 is described as costing about one-third of what Sonnet 4.5 will cost, making it attractive as the model that handles the bulk of agent execution even if Sonnet remains the best coding and frontier option. In other words, the upgrade isn’t about replacing the smartest model—it’s about making the delegated layer cheaper per completed task.

To ground the claims, the transcript reports timing tests run through GCP, using measures like time to first token and total response time. Claude Haiku 4.5 is reported to return a response in about 3.6 seconds, with roughly half a second to the first token. Sonnet 4.5 shows longer time-to-first-token and longer full response time, while Sonnet 4 is slower still on first-token timing. Compared with earlier Haiku versions, Haiku 4.5 shows a meaningful speed bump over Haiku 3.5, and it’s described as balancing intelligence improvements with faster or comparable generation times versus Claude 3 Haiku.

Overall, the transcript paints Haiku 4.5 as a practical default for agent “grunt work”: fast tool-using execution, structured classification, and high-volume coding or task steps—while reserving the heaviest reasoning for larger models.

Cornell Notes

Claude Haiku 4.5 costs more than earlier Haiku versions, but it’s positioned as a faster, more capable workhorse for agent workflows. Benchmarks cited in the transcript claim Haiku 4.5 can outperform Claude Sonnet 4 on tasks like Swebench and agent benchmarks, while still lagging behind the strongest models on pure reasoning. The practical strategy described is to use a frontier model (Sonnet 4.5 and possibly Opus 4.5) for planning and reasoning, then delegate execution-heavy steps—tool use, structured outputs, and coding—to Haiku 4.5. Timing tests reported via GCP show very quick “time to first token” and fast total response times, reinforcing its role as a low-latency grunt model.

Why does higher Haiku 4.5 pricing not automatically make it a worse choice for agents?

The transcript argues that cost must be weighed against speed and task performance. Even with $1 per million input tokens and $5 per million output tokens, Haiku 4.5 is described as running about twice as fast as Claude Sonnet 4 and performing strongly on agent benchmarks. In agent systems, faster execution reduces latency across multi-step tool calls and structured outputs, which can lower the effective cost per completed task even if output token pricing is higher.

What benchmark-style claim is used to justify Haiku 4.5 as more than a “cheap model”?

Anthropic’s comparisons highlighted in the transcript say Haiku 4.5 can surpass Claude Sonnet 4 on Swebench and multiple agent benchmarks, plus “computer use” evaluations. The transcript also notes Haiku 4.5 is close to Sonnet 4 on reasoning benchmarks, though it’s not framed as the best option when maximum reasoning quality is the priority.

How does the transcript describe the division of labor between Haiku 4.5 and larger models?

A two-tier workflow is emphasized: use Sonnet 4.5 (and potentially Opus 4.5) for planning and deeper reasoning, then use Haiku 4.5 as the fast execution layer. Haiku 4.5 is framed as the model that carries out delegated steps—function calling, structured output generation, and agent actions—so the system can move quickly without paying frontier-model costs for every step.

What timing results are reported for Haiku 4.5 versus Sonnet models?

In reported tests run through GCP, Claude Haiku 4.5 produced a response in about 3.6 seconds, with roughly 0.5 seconds to the first token. Claude Sonnet 4.5 shows longer time-to-first-token and longer full response time, while Claude Sonnet 4 is slower still on first-token timing. The transcript uses these differences to support the claim that Haiku 4.5 is optimized for low latency.

How does Haiku 4.5 compare to earlier Haiku versions in speed?

The transcript reports a speed improvement over Haiku 3.5, with Haiku 3.5 taking about 7 seconds to reach first token (in the same testing setup). Against Claude 3 Haiku, time-to-first-token is described as roughly similar, but total generation time is faster for Haiku 4.5—suggesting a better balance of intelligence and throughput.

Review Questions

  1. When would it make sense to pay for a frontier model instead of delegating to Haiku 4.5 in an agent pipeline?
  2. Which latency metric (time to first token vs total response time) most affects multi-step agent workflows, and why?
  3. How do the transcript’s benchmark claims and the reported timing tests reinforce (or contradict) each other?

Key Points

  1. 1

    Claude Haiku 4.5 is priced at $1 per million input tokens and $5 per million output tokens, higher than earlier Haiku releases.

  2. 2

    Anthropic’s benchmark comparisons claim Haiku 4.5 can outperform Claude Sonnet 4 on Swebench and several agent benchmarks, including computer use evaluations.

  3. 3

    The recommended agent pattern is to use Sonnet 4.5 (and possibly Opus 4.5) for planning/reasoning, while delegating execution-heavy steps to Haiku 4.5.

  4. 4

    Speed is framed as a major advantage: Haiku 4.5 is described as about twice as fast as Sonnet 4, which matters for tool-using, multi-step agents.

  5. 5

    Reported GCP timing tests show Haiku 4.5 returning in ~3.6 seconds with ~0.5 seconds to the first token, faster than Sonnet 4.5 and Sonnet 4 in the same setup.

  6. 6

    Compared with earlier Haiku models, Haiku 4.5 shows a noticeable speed bump over Haiku 3.5 and a faster overall generation time versus Claude 3 Haiku.

  7. 7

    The transcript positions Haiku 4.5 as a default “grunt work” model for structured outputs, classification, and coding tasks where throughput and latency dominate.

Highlights

Haiku 4.5 is pitched as a fast execution layer that can beat Claude Sonnet 4 on agent benchmarks like Swebench.
A key workflow theme: plan with Sonnet 4.5/Opus 4.5, then delegate the action steps to Haiku 4.5.
Reported timing tests show Claude Haiku 4.5 at ~3.6 seconds total response time and ~0.5 seconds to the first token.
The transcript frames speed as an agent advantage, not just a convenience—latency compounds across tool calls and multi-step tasks.

Topics

Mentioned