smolagents - HuggingFace's NEW Agent Framework
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
smolagents introduces “code agents” that can write and execute Python in a sandbox, aiming to make agent reasoning more direct than JSON-only planning.
Briefing
Hugging Face’s new “smolagents” framework pushes agent building toward “code agents”: instead of forcing an LLM to emit JSON-style plans, it can write and run Python in a sandbox to decide what to do next. The practical payoff is a simpler path from prompt to action—often just “model + tools”—while still keeping guardrails through restricted, sandboxed imports and tool access.
The framework’s first big differentiator is how it defines “agency.” It positions agents on a spectrum: from tightly constrained tool-calling loops (safer, more predictable) to higher-agency setups that let models take multi-step actions. smolagents leans into the middle ground by supporting both tool-calling agents and code agents, with code agents designed to let the model “think in code” and execute it step-by-step. This direction draws on prior research showing benefits from letting models execute code (e.g., via Python) and feed back structured results, rather than relying purely on text or JSON.
On the model side, smolagents is built to work with Hugging Face Hub models, with an out-of-the-box default using “Qwen 2.5 Coder 32B Instruct.” Access depends on Hugging Face account tier, but the framework also supports proprietary models through “light llm,” enabling use of OpenAI and Anthropic-style backends. In practice, the setup is framed as minimal: import a code agent, provide a model wrapper, and register tools. A built-in example computes the cube root of 27 by running Python directly—no external search tool needed.
Where the framework becomes most revealing is in real tasks that require external data and reasoning. For a route-time question (“drive from Melbourne to Sydney”), the agent switches to web search, extracts distance/time candidates from results, and produces an estimated range. For a more complex finance scenario (“buy Bitcoin with $11,000 to reach $1 million”), the agent attempts to fetch historical prices and write code to compute the answer, but it fails repeatedly due to sandbox restrictions—certain Python libraries (like requests) and even JSON handling are not authorized by default. The agent then falls back to less reliable strategies (web queries and printed outputs) and hits a maximum iteration limit, consuming substantial token budgets along the way.
The transcript also highlights how to tune the sandbox: authorized imports can be expanded (e.g., adding requests, bs4, or math), and system prompts can be overridden to change agent behavior. Even with tuning, the agent may still struggle with errors when assembling tables or performing multi-step data work, suggesting that reliability depends heavily on allowed libraries, iteration limits, and prompt/model fit.
Finally, smolagents supports traditional tool-calling patterns (React-like loops) and custom tools defined by developers, including specifying input/output schemas and pushing tools to the Hugging Face Hub for reuse. The overall message is that smolagents makes agent experimentation faster and more flexible, but code-agent reliability still hinges on sandbox permissions and error-handling—especially for data-heavy tasks.
Cornell Notes
smolagents is a Hugging Face library for building agents, with a standout focus on “code agents” that can write and execute Python in a sandbox. It supports both code agents and tool-calling agents, letting developers choose how much agency to grant. The framework is designed to be easy to start with—often just a model plus tools—using Hugging Face Hub models by default (Qwen 2.5 Coder 32B Instruct) or proprietary models via light llm. In demonstrations, simple math works well, while data-heavy tasks can fail when the sandbox blocks needed libraries (e.g., requests) or when iteration limits are reached. Custom tools, authorized imports, and system prompt overrides are key levers for improving outcomes.
What does “code agent” mean in smolagents, and how is it different from JSON-style agents?
How does smolagents handle model choice—Hugging Face Hub vs proprietary models?
Why did the Bitcoin investment example fail, and what does that reveal about sandboxing?
What knobs can improve code-agent performance in smolagents?
How do tool-calling agents and custom tools fit alongside code agents?
What role does memory play when an agent makes repeated mistakes?
Review Questions
- In smolagents, what are the practical consequences of restricting authorized imports in the sandbox?
- Compare the failure modes of the cube-root task versus the Bitcoin investment task—what changed in the agent’s required capabilities?
- How would you design a custom tool (inputs/outputs) to reduce reliance on code execution for a data-heavy workflow?
Key Points
- 1
smolagents introduces “code agents” that can write and execute Python in a sandbox, aiming to make agent reasoning more direct than JSON-only planning.
- 2
The framework supports a spectrum of agency, including both code agents and tool-calling agents, so developers can choose safety vs flexibility.
- 3
Default model usage centers on Hugging Face Hub models (including Qwen 2.5 Coder 32B Instruct), while proprietary models are supported via light llm (e.g., GPT 40).
- 4
Sandbox restrictions on authorized imports (such as blocking requests) can cause data-fetching and table-building tasks to fail, even when the model attempts multiple strategies.
- 5
Authorized imports, system prompt overrides, and model selection are key levers for improving reliability and reducing repeated errors.
- 6
Custom tools can be defined with explicit input/output schemas and can be shared via the Hugging Face Hub for reuse.
- 7
Token usage can spike during multi-step code-agent runs, especially when the agent hits max iterations after repeated failures.