Get AI summaries of any video or article — Sign up free
Claude Code BIG Update: Assign a Model to Your SubAgents thumbnail

Claude Code BIG Update: Assign a Model to Your SubAgents

All About AI·
5 min read

Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Claude Code now allows selecting a specific model for each sub-agent, enabling role-based model specialization.

Briefing

A major Claude Code update now lets users assign specific model choices to individual sub-agents—turning “one model for everything” into a more targeted setup that can cut latency and reduce token waste. In practice, the workflow lets an agent be edited so its sub-agent model can be selected from options like haiku, opus, and sonnet (and even inherit from a parent model). The practical payoff is speed: a simple dice-rolling agent ran dramatically faster depending on which model was assigned.

To test the change, a dice roller sub-agent was configured with instructions to simulate rolling a standard six-sided die exactly 10 times and return the results. With the sub-agent set to Opus, the run took about 1 minute and 5 seconds. Switching the same agent to Sonnet cut the runtime to roughly 17 seconds. Moving to Haiku was even faster, landing around 7.5 seconds. The results weren’t treated as a rigorous benchmark—there’s nondeterminism, and the agent behavior varied—but the pattern was clear enough to suggest model selection is a lever for performance.

The tests also highlighted a behavioral difference: Opus produced a completely different approach, generating and executing a Python script to perform the dice rolls. That raised a practical question about trust and verification—without logging or access to the sub-agent’s internal source, it’s hard to tell whether outputs are merely “echoed” or truly computed. Still, the runtime gap remained the strongest signal that model assignment matters.

A second experiment chained three specialized sub-agents to build a small project end-to-end. A planner agent used Opus to create a sprint plan. An execution agent used Sonnet to implement the plan by generating a simple HTML app that fetches the current Bitcoin price from the CoinGecko API and displays it in a dark-themed interface with auto-refresh and time/date details. Finally, a documentation agent used Haiku to write the README and project overview. This division matched typical workload needs: heavier reasoning for planning, stronger execution for building, and lightweight summarization for documentation.

During chaining, the workflow also surfaced token-management behavior. When the planner passed context to the main agent, the primary context window shifted upward (the transcript notes it reaching roughly 400 tokens added, while the overall chain produced a much larger token count—around 15,000). Even with that overhead, the chained workflow completed successfully, producing the tracker app and a fast README.

Overall, the update reframes agent design as an engineering decision: assign the right model to each sub-agent based on task complexity, then chain them for multi-step outcomes. The immediate promise is faster runs and more efficient token usage, with the added benefit of clearer specialization across planning, execution, and documentation.

Cornell Notes

Claude Code’s update adds model assignment per sub-agent, letting users choose haiku, opus, or sonnet for each role instead of using a single model everywhere. A dice-roller sub-agent ran far faster when switched from Opus (~1:05) to Sonnet (~17s) and then to Haiku (~7.5s), though behavior varied and nondeterminism prevents strict benchmarking. A chained workflow used Opus for sprint planning, Sonnet for building a CoinGecko-based Bitcoin price tracker, and Haiku for writing the README. The setup also changes how much context is carried between agents, with the transcript noting a noticeable context-window increase during chaining. The practical takeaway: model selection becomes a performance and token-efficiency tool when designing multi-agent systems.

What changed in Claude Code that affects how sub-agents run?

Users can edit an agent and explicitly select which model a sub-agent should use (options mentioned include haiku, opus, and sonnet). The workflow also supports inheriting a model from the parent agent, so a sub-agent can either be overridden or follow the main agent’s model. This turns model choice into a per-role configuration rather than a global setting.

How did model choice affect the dice-roller agent’s speed?

The same dice-roller instructions (roll a six-sided die 10 times and return results) were run with different assigned models. Opus took about 1 minute and 5 seconds. Sonnet completed in about 17 seconds. Haiku finished in about 7.5 seconds. The transcript notes the test isn’t scientific because outputs and generated code can vary, but the speed differences were large.

Why did Opus raise a trust/verification concern in the test?

With Opus, the agent generated a Python script and executed it to produce results. That behavior differs from the other model runs, and without logging or access to the sub-agent’s internal source, it’s difficult to verify whether the output reflects real computation versus something else. The user wanted logging/hooks to inspect what happened.

What role-based model assignment worked well in the chained project example?

A three-agent chain was configured as: (1) sprint planner agent on Opus, (2) execution agent on Sonnet, and (3) documentation agent on Haiku. The planner produced a sprint plan, the executor built an HTML Bitcoin price tracker using the CoinGecko API, and Haiku generated a quick README/project overview. The transcript frames this as a sensible mapping of heavier reasoning to planning, stronger execution to building, and lightweight summarization to documentation.

What token/context behavior showed up when chaining agents?

When the planner agent passed context to the main agent, the transcript notes the primary agent’s context window changed—rising to around 400 tokens added. It also mentions the overall chain produced a much larger token count (about 15,000). The key point is that chaining introduces measurable context/token overhead beyond the final output.

Review Questions

  1. In the dice-roller test, what were the approximate runtimes for Opus, Sonnet, and Haiku, and what does that imply about model assignment?
  2. Why might logging or hooks be important when a sub-agent generates and executes code (as happened with Opus)?
  3. In the three-agent chain (planner/executor/documentation), what model was assigned to each role and what artifact did each role produce?

Key Points

  1. 1

    Claude Code now allows selecting a specific model for each sub-agent, enabling role-based model specialization.

  2. 2

    A simple dice-roller sub-agent ran much faster when switched from Opus (~1:05) to Sonnet (~17s) to Haiku (~7.5s).

  3. 3

    Model choice can change not just speed but also behavior—for example, Opus generated and executed a Python script.

  4. 4

    Without logging or access to sub-agent internals, it can be hard to verify whether code-based outputs are truly computed versus superficially produced.

  5. 5

    A chained workflow can map models to roles: Opus for planning, Sonnet for execution/building, and Haiku for documentation/README writing.

  6. 6

    Chaining agents affects context/token usage; passing planner context increased the primary context window and contributed to a larger total token count.

  7. 7

    The practical design goal is to reduce time and tokens by assigning the right model to each sub-agent and then chaining them for end-to-end tasks.

Highlights

Assigning models per sub-agent produced large speed swings: Opus (~1:05) vs Sonnet (~17s) vs Haiku (~7.5s) for the same dice-rolling task.
Opus took a notably different route by generating and executing a Python script, raising questions about verification without logging.
A three-agent chain (Opus planner → Sonnet executor → Haiku documentation) successfully built a CoinGecko Bitcoin tracker and generated a README quickly.
Chaining increased context/token usage—context window growth was observed when the planner passed information to the next stage.

Topics