Anthropic Does The Unthinkable with Haiku 3.5

TL;DR

Claude 3.5 Haiku’s pricing increases to $1 per million tokens in and $5 per million tokens out, roughly a fourfold jump from Claude 3 Haiku.

Briefing Cornell Notes

Briefing

Claude 3.5 Haiku arrives with a major price jump—$1 per million tokens in and $5 per million tokens out—turning what used to be a budget-friendly “workhorse” model into a more premium option. The central question raised by the rollout is whether the intelligence gains are large enough to justify a fourfold cost increase, especially as competitors keep pushing faster, cheaper models.

The update is available not only on Anthropic’s platform but also from day one on Amazon Bedrock and Google Vertex AI. Early testing described in the transcript finds the model promising for agentic workflows, with the claim that it can handle roughly 90% of common LLM tasks. But the pricing change dominates community reaction because the prior Claude 3 Haiku was priced at 25 cents per million tokens in and $1.25 per million tokens out. Moving to Claude 3.5 Haiku is therefore framed as a steep step up rather than a minor adjustment.

Anthropic’s justification is that Claude 3.5 Haiku surpasses Claude 3 Opus on many benchmarks while doing so at a fraction of the cost. That framing implies a shift in how buyers should think about model upgrades: as models get smarter, customers may need to pay more rather than expecting prices to fall. The transcript contrasts this with the broader industry pattern where newer models often come with reduced pricing over time.

A second constraint is capability tradeoffs. Claude 3.5 Haiku cannot accept images, so Anthropic keeps the original Claude 3 Haiku available for image analysis use cases—suggesting the new model is optimized for text-heavy workloads.

Quality-versus-price comparisons add pressure to the pricing decision. In third-party-style “analysis quality” and “quality vs. price” breakdowns cited in the transcript, Claude 3.5 Haiku lands just below Claude 3 Opus but above several open models (including Llama 3.1 and Mistral Nemo variants). Yet it still trails behind stronger closed models like GPT-4o and Gemini 1.5, and—surprisingly—behind GPT-4o-mini in some comparisons. The transcript also notes that speed alone won’t carry the economics: models such as GPT-4o-mini and Gemini 1.5 Flash are described as offering better price performance, with Flash variants potentially costing only a few cents per million tokens.

Finally, the transcript raises an uncertainty that matters for future purchasing: whether the “3.5” naming hides a larger internal compute jump. Because proprietary model families don’t clearly disclose size or architecture changes, the transcript speculates that Claude 3.5 Haiku may be closer to the old Sonnet tier, while the new Sonnet 3.5 may be closer to the old Opus tier—meaning the upcoming “3.5 Opus” could be substantially more expensive.

Over the next few days, the plan is to benchmark where Claude 3.5 Haiku is the best choice (coding and agent orchestration are highlighted) versus when cheaper alternatives—especially Gemini Flash for token-heavy workloads—should win. The broader takeaway is that the market may be entering a phase where new model generations can cost more upfront, even if they remain cheaper than last year’s weaker models.

Cornell Notes

Claude 3.5 Haiku is positioned as a smarter, faster small model, but it comes with a fourfold price increase: $1 per million tokens in and $5 per million tokens out (up from Claude 3 Haiku’s $0.25 in and $1.25 out). Early tests suggest strong performance for agentic workflows and coding, but third-party quality/price comparisons indicate it still lags behind top competitors like GPT-4o and Gemini 1.5, and even behind GPT-4o-mini on some price-performance measures. The model also cannot process images, keeping that capability on the original Claude 3 Haiku. The key learning point is how to decide when the added intelligence is worth the added cost, especially as cheaper, fast alternatives improve.

What changed most with Claude 3.5 Haiku, and why does it matter for everyday use?

The biggest change is pricing. Claude 3.5 Haiku is priced at $1 per million tokens in and $5 per million tokens out, compared with Claude 3 Haiku’s $0.25 in and $1.25 out—about a 4× increase. That matters because many teams previously used Haiku as a low-cost default for extraction, curation, and other high-volume “workhorse” tasks. With the new rates, cost-performance decisions shift toward models that deliver similar quality at lower per-token cost.

What capability tradeoff comes with the new Haiku model?

Claude 3.5 Haiku cannot take images as input. Anthropic keeps the original Claude 3 Haiku available for image analysis and related multimodal use cases, implying Claude 3.5 Haiku is optimized for text-only workloads.

How does Claude 3.5 Haiku compare on quality and price to other models mentioned?

In the cited quality/price breakdowns, Claude 3.5 Haiku is described as faster than many models but still slower than GPT-4o-mini, GPT-4o, and Flash 1.5. On price-performance, GPT-4o-mini and Gemini 1.5 Flash are described as beating it, with Flash variants potentially costing only a few cents per million tokens. So the transcript frames Haiku’s advantage as task-dependent rather than universally best.

What does the transcript suggest about why pricing increased so much?

It raises the possibility that the internal compute/size jump is larger than the naming suggests. Because proprietary models don’t clearly disclose architecture changes, the transcript speculates that Claude 3.5 Haiku may be closer to the old Sonnet tier, and that the new Sonnet 3.5 may be closer to the old Opus tier—meaning the upcoming “3.5 Opus” could be even more expensive.

Where does Claude 3.5 Haiku appear to be a strong fit?

Early testing described in the transcript points to coding and agentic workflows as areas where Claude 3.5 Haiku performs well. The practical takeaway is that it may justify its higher cost when tasks benefit from stronger reasoning or orchestration, rather than when simple extraction or curation dominates.

What strategy does the transcript propose for choosing models going forward?

It suggests splitting workloads by cost sensitivity: use cheaper options like Gemini Flash for token-heavy, pricing-sensitive work; use Claude Sonnet (including the new Sonnet 3.5) for orchestration or higher-quality reasoning layers; and consider other reasoning-focused models (the transcript mentions OpenAI o1) for orchestration. The goal is to match model tier to task complexity and token volume.

Review Questions

If Claude 3.5 Haiku is 4× more expensive than Claude 3 Haiku, what kinds of tasks would still justify switching to it?
How does the “no images” limitation change the decision between Claude 3.5 Haiku and the original Claude 3 Haiku?
What evidence in the transcript suggests that speed alone may not be enough to win on cost-performance?

Key Points

1
Claude 3.5 Haiku’s pricing increases to $1 per million tokens in and $5 per million tokens out, roughly a fourfold jump from Claude 3 Haiku.
2
Claude 3.5 Haiku is available on Anthropic’s platform and from day one on Amazon Bedrock and Google Vertex AI.
3
The new Haiku model is text-only: it cannot accept images, while the original Claude 3 Haiku remains the image-capable option.
4
Quality/price comparisons cited place Claude 3.5 Haiku below GPT-4o and Gemini 1.5, and sometimes even behind GPT-4o-mini on price-performance.
5
The transcript highlights coding and agentic workflows as early areas where Claude 3.5 Haiku can justify its higher cost.
6
Uncertainty remains about how much compute changed behind the “3.5” naming, raising questions about future pricing for a potential “3.5 Opus.”
7
A workload-based strategy is recommended: use cheaper fast models for high-token-volume tasks and reserve stronger (often pricier) models for orchestration and complex reasoning.

Highlights

Claude 3.5 Haiku moves from a budget tier to a premium one, with $1 in and $5 out per million tokens—about 4× the prior Haiku pricing.

The model’s text-only limitation (no image input) keeps multimodal work on the original Claude 3 Haiku.

Even with strong performance for agentic workflows, third-party quality/price comparisons suggest cheaper competitors like GPT-4o-mini and Gemini 1.5 Flash can outperform on cost-performance.

The transcript flags a naming ambiguity: “3.5” may hide a larger compute shift, making future Opus pricing a major unknown.

Topics

Claude 3.5 Haiku Pricing
Agentic Workflows
Cost vs Quality
Multimodal Limitations
Model Tiering Strategy

Mentioned

Sam Witteveen