Claude Code BIG Update: Assign a Model to Your SubAgents
Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Claude Code now allows selecting a specific model for each sub-agent, enabling role-based model specialization.
Briefing
A major Claude Code update now lets users assign specific model choices to individual sub-agents—turning “one model for everything” into a more targeted setup that can cut latency and reduce token waste. In practice, the workflow lets an agent be edited so its sub-agent model can be selected from options like haiku, opus, and sonnet (and even inherit from a parent model). The practical payoff is speed: a simple dice-rolling agent ran dramatically faster depending on which model was assigned.
To test the change, a dice roller sub-agent was configured with instructions to simulate rolling a standard six-sided die exactly 10 times and return the results. With the sub-agent set to Opus, the run took about 1 minute and 5 seconds. Switching the same agent to Sonnet cut the runtime to roughly 17 seconds. Moving to Haiku was even faster, landing around 7.5 seconds. The results weren’t treated as a rigorous benchmark—there’s nondeterminism, and the agent behavior varied—but the pattern was clear enough to suggest model selection is a lever for performance.
The tests also highlighted a behavioral difference: Opus produced a completely different approach, generating and executing a Python script to perform the dice rolls. That raised a practical question about trust and verification—without logging or access to the sub-agent’s internal source, it’s hard to tell whether outputs are merely “echoed” or truly computed. Still, the runtime gap remained the strongest signal that model assignment matters.
A second experiment chained three specialized sub-agents to build a small project end-to-end. A planner agent used Opus to create a sprint plan. An execution agent used Sonnet to implement the plan by generating a simple HTML app that fetches the current Bitcoin price from the CoinGecko API and displays it in a dark-themed interface with auto-refresh and time/date details. Finally, a documentation agent used Haiku to write the README and project overview. This division matched typical workload needs: heavier reasoning for planning, stronger execution for building, and lightweight summarization for documentation.
During chaining, the workflow also surfaced token-management behavior. When the planner passed context to the main agent, the primary context window shifted upward (the transcript notes it reaching roughly 400 tokens added, while the overall chain produced a much larger token count—around 15,000). Even with that overhead, the chained workflow completed successfully, producing the tracker app and a fast README.
Overall, the update reframes agent design as an engineering decision: assign the right model to each sub-agent based on task complexity, then chain them for multi-step outcomes. The immediate promise is faster runs and more efficient token usage, with the added benefit of clearer specialization across planning, execution, and documentation.
Cornell Notes
Claude Code’s update adds model assignment per sub-agent, letting users choose haiku, opus, or sonnet for each role instead of using a single model everywhere. A dice-roller sub-agent ran far faster when switched from Opus (~1:05) to Sonnet (~17s) and then to Haiku (~7.5s), though behavior varied and nondeterminism prevents strict benchmarking. A chained workflow used Opus for sprint planning, Sonnet for building a CoinGecko-based Bitcoin price tracker, and Haiku for writing the README. The setup also changes how much context is carried between agents, with the transcript noting a noticeable context-window increase during chaining. The practical takeaway: model selection becomes a performance and token-efficiency tool when designing multi-agent systems.
What changed in Claude Code that affects how sub-agents run?
How did model choice affect the dice-roller agent’s speed?
Why did Opus raise a trust/verification concern in the test?
What role-based model assignment worked well in the chained project example?
What token/context behavior showed up when chaining agents?
Review Questions
- In the dice-roller test, what were the approximate runtimes for Opus, Sonnet, and Haiku, and what does that imply about model assignment?
- Why might logging or hooks be important when a sub-agent generates and executes code (as happened with Opus)?
- In the three-agent chain (planner/executor/documentation), what model was assigned to each role and what artifact did each role produce?
Key Points
- 1
Claude Code now allows selecting a specific model for each sub-agent, enabling role-based model specialization.
- 2
A simple dice-roller sub-agent ran much faster when switched from Opus (~1:05) to Sonnet (~17s) to Haiku (~7.5s).
- 3
Model choice can change not just speed but also behavior—for example, Opus generated and executed a Python script.
- 4
Without logging or access to sub-agent internals, it can be hard to verify whether code-based outputs are truly computed versus superficially produced.
- 5
A chained workflow can map models to roles: Opus for planning, Sonnet for execution/building, and Haiku for documentation/README writing.
- 6
Chaining agents affects context/token usage; passing planner context increased the primary context window and contributed to a larger total token count.
- 7
The practical design goal is to reduce time and tokens by assigning the right model to each sub-agent and then chaining them for end-to-end tasks.