Introducing Swarm with Code Examples: OpenAI's Groundbreaking Agent Framework
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Swarm’s architecture is built around routines (structured instruction lists) and handoffs (conversation transfers to specialized agents).
Briefing
OpenAI’s Swarm has landed as a lightweight framework for building multi-agent systems, and the core idea is simple: model behavior as small “routines” that can hand off an active conversation to other specialized agents. That handoff mechanism—more than any single model feature—lets developers cascade tasks across multiple agents without stuffing every tool and instruction into one monolithic assistant.
Swarm’s design centers on two concepts. First are routines: natural-language instruction lists that become an agent’s system prompt, often formatted as bullet points or numbered steps. Those instructions don’t just tell the agent what to do; they also guide which tools the agent should use. The approach resembles common agent patterns—give an agent instructions plus tool access, then let it decide how to proceed—but Swarm frames it in a more modular way.
Second is handoff. Instead of one agent trying to handle everything, Swarm encourages building many small agents, each with its own system prompt and toolset. A “master” or “triage” agent can transfer the conversation to another agent, and the receiving agent continues with full knowledge of prior context—analogous to a phone call transfer, but with the conversation history carried over. This enables clean delegation flows like sales → refunds, or triage → flight modification → flight change, while keeping each agent’s responsibilities narrow.
The transcript emphasizes Swarm’s practical tradeoffs. It’s currently oriented toward OpenAI models, with the possibility of hacking it toward Ollama and other open models but with “mixed success.” It also differs from frameworks such as LangGraph, CrewAI, and AutoGen by leaning on conversation memory rather than a more elaborate state system. That choice makes Swarm easier to inspect and reason about, but it also raises the risk of loops and limits the kind of built-in memory/reflection workflows seen in heavier orchestration frameworks.
Code examples illustrate how the pieces fit together. Agents are instantiated with a name and instructions, which effectively become the system prompt. Routines add step-by-step structure—like a honey sales pitch that asks for the customer’s name, probes health concerns, addresses objections, and closes the deal—so the model follows a predictable sequence across multiple turns.
Handoffs are implemented as functions the agent can call. The examples include language-specific agents (English and Spanish) that transfer based on the user’s language, plus a “cat agent” that enforces a response ending in “Meow meow Meow.” Another example builds an airline support flow with a triage agent that delegates to flight modification, cancellation, change, or baggage handling, while injecting customer context variables like customer ID and upcoming flight details.
Beyond orchestration, Swarm supports variable injection into prompts (similar to RAG-style context insertion) and tool calling for actions such as weather lookup, sending emails, processing refunds, and checking flight eligibility. A custom search agent demonstrates tool-driven retrieval via DuckDuckGo search tools, then rephrasing tool outputs into user-friendly responses—highlighting the need for an explicit “reasoner” step so the assistant doesn’t just echo raw tool results.
Overall, Swarm’s appeal in the transcript is its modular “routines + handoffs” architecture: cascade responsibilities across agents, keep toolsets small, and use transfers to maintain focus. The main caution is the lack of a robust state/memory system, which can complicate long-running workflows and loop prevention compared with more stateful frameworks.
Cornell Notes
Swarm organizes agent behavior around two building blocks: routines and handoffs. Routines are natural-language instruction lists that become an agent’s system prompt, often structured as numbered steps, guiding both what to do and which tools to use. Handoffs let one agent transfer the active conversation to another specialized agent, preserving prior context while switching toolsets and instructions. This makes multi-agent systems easier to modularize—triage delegates to sales or refunds, or airline support cascades through flight modification steps. The tradeoff is a lighter approach to state and memory, which can increase the risk of loops and reduces the built-in workflow features found in heavier frameworks like LangGraph.
What exactly is a “routine” in Swarm, and how does it differ from a simple agent instruction prompt?
How does handoff work, and why does it matter for multi-agent systems?
Why does the transcript claim Swarm is “lightweight,” and what are the consequences?
How do context variables and tool calling fit into Swarm’s agent design?
What’s the practical lesson from the custom DuckDuckGo search agent about tool outputs?
How do the examples enforce agent-specific behavior after a handoff?
Review Questions
- In Swarm, how do routines and handoffs work together to create a multi-step workflow without one agent holding all tools?
- What risks arise from Swarm’s lighter state/memory approach, and how might a developer mitigate loop behavior?
- Why is it important for an agent to rephrase tool outputs rather than returning raw JSON or tool text directly?
Key Points
- 1
Swarm’s architecture is built around routines (structured instruction lists) and handoffs (conversation transfers to specialized agents).
- 2
Routines turn natural-language steps into an agent’s system prompt, guiding both behavior and tool usage across multiple turns.
- 3
Handoffs preserve conversation context while switching to a new agent’s system prompt and toolset, enabling cascaded delegation flows.
- 4
Swarm is currently oriented toward OpenAI models; adapting it to other model runtimes like Ollama may work but can be unreliable.
- 5
Swarm relies more on conversation memory than a full state system, which can simplify debugging but increases loop risk for complex workflows.
- 6
Tool calling and context-variable injection let agents perform actions (weather, email, refunds, flight changes) and personalize responses using runtime data.
- 7
A separate reasoning/rephrasing step is important so the assistant turns tool outputs into user-friendly answers instead of echoing raw results.