How to make Muilt-Agent Apps with smolagents
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Tool-calling agents are generally more reliable than code agents when models are small or local, because they can anchor progress on tool observations.
Briefing
Multi-agent apps built with smolagents work best when the system leans on tool-calling and strong hosted models—small local “code agents” tend to waste tokens and stumble, while managed tool workflows can reliably complete multi-step tasks.
The walkthrough starts with baseline agent types. A standard code agent can call a hosted Hugging Face model, but it requires a Hugging Face token in environment variables and may only be practical on paid tiers for larger models like “Llama 3.3 70b.” When the agent is asked to do simple work, it can respond quickly, but performance changes sharply once tools enter the picture. Tool-calling agents are positioned as the safer default for structured outputs (often JSON-like), because they can invoke specific tools and then use the returned observations.
Local execution via Ollama highlights the tradeoff. Running tool-calling with Ollama can work—especially with a small model like “llama 3.2 3B” or “Qwen2.5 Coder 7B”—but the system often needs multiple tool/observation steps to reach an answer. That increases token churn and can lead to “wandering” behavior. The transcript is blunt that code agents with smaller models are not consistently reliable; they may churn through elaborate intermediate text and still fail to land cleanly. The practical takeaway: small models can handle simple tool use, but code-agent-style multi-step reasoning is much more fragile unless the model is larger or fine-tuned on function-calling examples.
Switching to proprietary models changes the picture. Claude (noted as an older “3.5” variant) handles code-agent workflows more smoothly, completing search → gather information → synthesize an answer with fewer issues. Gemini Flash can manage tool-calling well, but it’s described as weaker on code-agent tasks, sometimes stumbling before producing a final decision. Across these comparisons, the pattern is consistent: tool agents are robust; code agents are model-sensitive.
The next major capability is turning agents into quick UIs with Gradio. A text-to-image tool is wrapped and launched through Gradio, with “trust_remote_code=True” required for these tools. The agent can decide when to call an image generator tool (e.g., generating “a cat riding a horse”), then return the result to the interface. This makes it straightforward to build small interactive apps where the model selects tools and the UI renders outputs.
From there, the transcript moves into multi-agent construction. Tools are treated as the building blocks, but creating custom tools is “fussy”: decorated tools require precise argument typing and exact parameter naming (e.g., using “args” rather than “parameters”). For multi-agent orchestration, a “managed agent” wraps a tool-calling agent with a name and description, and a “manager agent” (a code agent) directs which managed agents to run and how to chain their outputs.
Two multi-agent examples show the payoff. A “movie + Rotten Tomatoes score” workflow uses a web search/scrape agent to find a film released in 1955 starring James Dean, then scrapes Rotten Tomatoes to compute the final score (93%). A more advanced “blog writer” pipeline adds multiple specialized agents: a research agent (web search + scraping), a research checker (no tools, just relevance validation), a writer, and a copy editor. For a prompt like “top five products released at CES 2025 so far,” the system performs multiple scrapes, drafts the blog post, then polishes it—while still depending heavily on tool quality and model strength.
Overall, the guidance is clear: multi-agent apps become dependable when the orchestration is tool-first, the models are strong enough for planning, and custom tools are built carefully for consistent structured inputs/outputs.
Cornell Notes
smolagents multi-agent apps work best when planning is handled by a capable manager model and the work is executed through tool-calling agents. Small local models can call tools, but code-agent workflows often churn tokens and struggle to finish cleanly. Proprietary models like Claude handle code-agent chains more reliably, while Gemini Flash is stronger for tool-calling than for code-agent tasks. Gradio integration makes it easy to turn tool-using agents into interactive apps, including text-to-image generation. Multi-agent orchestration uses managed agents (wrapping tool agents with names/descriptions) and a manager agent that chains steps such as web search, scraping, drafting, and editing.
Why do tool-calling agents tend to outperform code agents when using smaller models?
What changes when moving from Ollama/local models to proprietary models like Claude and Gemini?
How does Gradio fit into the agent workflow?
What makes custom tool creation “fussy” in this framework?
How does the multi-agent orchestration work in practice?
Review Questions
- In what situations does the transcript recommend preferring tool-calling agents over code agents, and what failure mode appears with smaller models?
- Describe the roles of a managed agent and a manager agent in the multi-agent examples. How does the manager decide the next step?
- What specific requirements make custom tools difficult to implement, and why do those requirements matter for multi-agent reliability?
Key Points
- 1
Tool-calling agents are generally more reliable than code agents when models are small or local, because they can anchor progress on tool observations.
- 2
Small local code agents (via Ollama) often churn through many tokens and may produce messy intermediate steps before failing or finishing late.
- 3
Proprietary models like Claude tend to handle code-agent chains more cleanly, while Gemini Flash is strongest for tool-calling workflows.
- 4
Gradio integration turns tool-using agents into interactive apps quickly, including text-to-image generation via a wrapped image tool (with trust_remote_code=True).
- 5
Custom tool creation requires precise decorator-compatible argument schemas and correct naming (e.g., using args), or tool calling becomes unreliable.
- 6
Multi-agent systems chain work by wrapping tool agents as managed agents and using a manager agent to orchestrate multi-hop tasks like search → scrape → synthesize.
- 7
Multi-agent blog pipelines can be built by splitting responsibilities across specialized agents (research, relevance checking, writing, copy editing) and letting the manager coordinate them.