Get AI summaries of any video or article — Sign up free
How to make Muilt-Agent Apps with smolagents thumbnail

How to make Muilt-Agent Apps with smolagents

Sam Witteveen·
5 min read

Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Tool-calling agents are generally more reliable than code agents when models are small or local, because they can anchor progress on tool observations.

Briefing

Multi-agent apps built with smolagents work best when the system leans on tool-calling and strong hosted models—small local “code agents” tend to waste tokens and stumble, while managed tool workflows can reliably complete multi-step tasks.

The walkthrough starts with baseline agent types. A standard code agent can call a hosted Hugging Face model, but it requires a Hugging Face token in environment variables and may only be practical on paid tiers for larger models like “Llama 3.3 70b.” When the agent is asked to do simple work, it can respond quickly, but performance changes sharply once tools enter the picture. Tool-calling agents are positioned as the safer default for structured outputs (often JSON-like), because they can invoke specific tools and then use the returned observations.

Local execution via Ollama highlights the tradeoff. Running tool-calling with Ollama can work—especially with a small model like “llama 3.2 3B” or “Qwen2.5 Coder 7B”—but the system often needs multiple tool/observation steps to reach an answer. That increases token churn and can lead to “wandering” behavior. The transcript is blunt that code agents with smaller models are not consistently reliable; they may churn through elaborate intermediate text and still fail to land cleanly. The practical takeaway: small models can handle simple tool use, but code-agent-style multi-step reasoning is much more fragile unless the model is larger or fine-tuned on function-calling examples.

Switching to proprietary models changes the picture. Claude (noted as an older “3.5” variant) handles code-agent workflows more smoothly, completing search → gather information → synthesize an answer with fewer issues. Gemini Flash can manage tool-calling well, but it’s described as weaker on code-agent tasks, sometimes stumbling before producing a final decision. Across these comparisons, the pattern is consistent: tool agents are robust; code agents are model-sensitive.

The next major capability is turning agents into quick UIs with Gradio. A text-to-image tool is wrapped and launched through Gradio, with “trust_remote_code=True” required for these tools. The agent can decide when to call an image generator tool (e.g., generating “a cat riding a horse”), then return the result to the interface. This makes it straightforward to build small interactive apps where the model selects tools and the UI renders outputs.

From there, the transcript moves into multi-agent construction. Tools are treated as the building blocks, but creating custom tools is “fussy”: decorated tools require precise argument typing and exact parameter naming (e.g., using “args” rather than “parameters”). For multi-agent orchestration, a “managed agent” wraps a tool-calling agent with a name and description, and a “manager agent” (a code agent) directs which managed agents to run and how to chain their outputs.

Two multi-agent examples show the payoff. A “movie + Rotten Tomatoes score” workflow uses a web search/scrape agent to find a film released in 1955 starring James Dean, then scrapes Rotten Tomatoes to compute the final score (93%). A more advanced “blog writer” pipeline adds multiple specialized agents: a research agent (web search + scraping), a research checker (no tools, just relevance validation), a writer, and a copy editor. For a prompt like “top five products released at CES 2025 so far,” the system performs multiple scrapes, drafts the blog post, then polishes it—while still depending heavily on tool quality and model strength.

Overall, the guidance is clear: multi-agent apps become dependable when the orchestration is tool-first, the models are strong enough for planning, and custom tools are built carefully for consistent structured inputs/outputs.

Cornell Notes

smolagents multi-agent apps work best when planning is handled by a capable manager model and the work is executed through tool-calling agents. Small local models can call tools, but code-agent workflows often churn tokens and struggle to finish cleanly. Proprietary models like Claude handle code-agent chains more reliably, while Gemini Flash is stronger for tool-calling than for code-agent tasks. Gradio integration makes it easy to turn tool-using agents into interactive apps, including text-to-image generation. Multi-agent orchestration uses managed agents (wrapping tool agents with names/descriptions) and a manager agent that chains steps such as web search, scraping, drafting, and editing.

Why do tool-calling agents tend to outperform code agents when using smaller models?

Tool-calling agents can invoke a specific tool and then use the returned observation to proceed, often requiring fewer “wandering” intermediate steps. In the transcript, Ollama-based runs with small models can successfully perform tool calls, but code-agent behavior is described as less reliable: it may produce elaborate intermediate output, churn through many tokens, and still stumble. The more dependable pattern is: keep the model focused on selecting tools and consuming tool outputs rather than doing complex multi-step code-style reasoning on its own.

What changes when moving from Ollama/local models to proprietary models like Claude and Gemini?

Proprietary models handle multi-step agent chains more smoothly. Claude (older “3.5” noted) completes search → gather information → synthesize answers quickly for code-agent workflows. Gemini Flash can handle tool agents effectively and often reaches final answers with fewer observations, but it’s described as weaker on code-agent tasks, sometimes stumbling before producing a result.

How does Gradio fit into the agent workflow?

Gradio provides a simple UI layer around tools. The transcript demonstrates importing a text-to-image tool and launching a Gradio app, with “trust_remote_code=True” required. Users type prompts like “make me an image of a cat riding a horse,” and the agent decides to call the image generator tool (running the “flux” model on Hugging Face servers) and then returns the generated image to the interface.

What makes custom tool creation “fussy” in this framework?

Decorated tools require exact metadata: the tool description and argument schema must match what the decorator expects. The transcript emphasizes that the framework wants specific typing and correct argument naming—using “args” correctly rather than alternatives like “parameters”—and that mismatches can break tool behavior. Getting these details right is necessary for consistent tool calling and downstream multi-agent chaining.

How does the multi-agent orchestration work in practice?

A tool-calling agent is wrapped into a “managed agent” with a name and description (e.g., a web search agent that runs DuckDuckGo search and scrapes content via Gina AI). A “manager agent” (a code agent) then directs which managed agents to run and how to chain their outputs. In the movie example, the manager first identifies the correct film, then triggers scraping for Rotten Tomatoes, and finally composes a structured answer including the Rotten Tomatoes score (93%).

Review Questions

  1. In what situations does the transcript recommend preferring tool-calling agents over code agents, and what failure mode appears with smaller models?
  2. Describe the roles of a managed agent and a manager agent in the multi-agent examples. How does the manager decide the next step?
  3. What specific requirements make custom tools difficult to implement, and why do those requirements matter for multi-agent reliability?

Key Points

  1. 1

    Tool-calling agents are generally more reliable than code agents when models are small or local, because they can anchor progress on tool observations.

  2. 2

    Small local code agents (via Ollama) often churn through many tokens and may produce messy intermediate steps before failing or finishing late.

  3. 3

    Proprietary models like Claude tend to handle code-agent chains more cleanly, while Gemini Flash is strongest for tool-calling workflows.

  4. 4

    Gradio integration turns tool-using agents into interactive apps quickly, including text-to-image generation via a wrapped image tool (with trust_remote_code=True).

  5. 5

    Custom tool creation requires precise decorator-compatible argument schemas and correct naming (e.g., using args), or tool calling becomes unreliable.

  6. 6

    Multi-agent systems chain work by wrapping tool agents as managed agents and using a manager agent to orchestrate multi-hop tasks like search → scrape → synthesize.

  7. 7

    Multi-agent blog pipelines can be built by splitting responsibilities across specialized agents (research, relevance checking, writing, copy editing) and letting the manager coordinate them.

Highlights

Small-model code agents can work for simple tasks, but they frequently waste tokens and stumble during multi-step tool/observation loops.
Claude handles code-agent workflows with fewer issues than Gemini Flash, while Gemini Flash excels at tool-calling.
A Gradio UI can be driven entirely by agent tool decisions—typing a prompt can trigger automatic tool selection and image generation.
Multi-agent orchestration relies on managed agents (tool wrappers) plus a manager agent that chains steps across web search, scraping, and synthesis.
Custom tools are fragile unless argument typing and decorator expectations are followed exactly.

Topics

Mentioned

  • RAG
  • JSON
  • UI
  • API
  • PIP