Introducing Swarm with Code Examples: OpenAI's Groundbreaking Agent Framework

TL;DR

Swarm’s architecture is built around routines (structured instruction lists) and handoffs (conversation transfers to specialized agents).

Briefing Cornell Notes

Briefing

OpenAI’s Swarm has landed as a lightweight framework for building multi-agent systems, and the core idea is simple: model behavior as small “routines” that can hand off an active conversation to other specialized agents. That handoff mechanism—more than any single model feature—lets developers cascade tasks across multiple agents without stuffing every tool and instruction into one monolithic assistant.

Swarm’s design centers on two concepts. First are routines: natural-language instruction lists that become an agent’s system prompt, often formatted as bullet points or numbered steps. Those instructions don’t just tell the agent what to do; they also guide which tools the agent should use. The approach resembles common agent patterns—give an agent instructions plus tool access, then let it decide how to proceed—but Swarm frames it in a more modular way.

Second is handoff. Instead of one agent trying to handle everything, Swarm encourages building many small agents, each with its own system prompt and toolset. A “master” or “triage” agent can transfer the conversation to another agent, and the receiving agent continues with full knowledge of prior context—analogous to a phone call transfer, but with the conversation history carried over. This enables clean delegation flows like sales → refunds, or triage → flight modification → flight change, while keeping each agent’s responsibilities narrow.

The transcript emphasizes Swarm’s practical tradeoffs. It’s currently oriented toward OpenAI models, with the possibility of hacking it toward Ollama and other open models but with “mixed success.” It also differs from frameworks such as LangGraph, CrewAI, and AutoGen by leaning on conversation memory rather than a more elaborate state system. That choice makes Swarm easier to inspect and reason about, but it also raises the risk of loops and limits the kind of built-in memory/reflection workflows seen in heavier orchestration frameworks.

Code examples illustrate how the pieces fit together. Agents are instantiated with a name and instructions, which effectively become the system prompt. Routines add step-by-step structure—like a honey sales pitch that asks for the customer’s name, probes health concerns, addresses objections, and closes the deal—so the model follows a predictable sequence across multiple turns.

Handoffs are implemented as functions the agent can call. The examples include language-specific agents (English and Spanish) that transfer based on the user’s language, plus a “cat agent” that enforces a response ending in “Meow meow Meow.” Another example builds an airline support flow with a triage agent that delegates to flight modification, cancellation, change, or baggage handling, while injecting customer context variables like customer ID and upcoming flight details.

Beyond orchestration, Swarm supports variable injection into prompts (similar to RAG-style context insertion) and tool calling for actions such as weather lookup, sending emails, processing refunds, and checking flight eligibility. A custom search agent demonstrates tool-driven retrieval via DuckDuckGo search tools, then rephrasing tool outputs into user-friendly responses—highlighting the need for an explicit “reasoner” step so the assistant doesn’t just echo raw tool results.

Overall, Swarm’s appeal in the transcript is its modular “routines + handoffs” architecture: cascade responsibilities across agents, keep toolsets small, and use transfers to maintain focus. The main caution is the lack of a robust state/memory system, which can complicate long-running workflows and loop prevention compared with more stateful frameworks.

Cornell Notes

Swarm organizes agent behavior around two building blocks: routines and handoffs. Routines are natural-language instruction lists that become an agent’s system prompt, often structured as numbered steps, guiding both what to do and which tools to use. Handoffs let one agent transfer the active conversation to another specialized agent, preserving prior context while switching toolsets and instructions. This makes multi-agent systems easier to modularize—triage delegates to sales or refunds, or airline support cascades through flight modification steps. The tradeoff is a lighter approach to state and memory, which can increase the risk of loops and reduces the built-in workflow features found in heavier frameworks like LangGraph.

What exactly is a “routine” in Swarm, and how does it differ from a simple agent instruction prompt?

A routine is a structured set of steps written in natural language that becomes the agent’s system prompt. In the examples, it’s often formatted as numbered instructions (e.g., ask for the customer’s name, identify health concerns like allergies or jet lag, pitch honey’s benefits, handle objections, close the sale, and thank the customer). Unlike a single-line instruction such as “be super enthusiastic,” routines impose an ordered sequence that the agent follows across multiple turns, effectively turning prompting into a step-by-step workflow.

How does handoff work, and why does it matter for multi-agent systems?

Handoff is implemented as a callable function that transfers the active conversation from one agent/routine to another. The receiving agent keeps full knowledge of the prior conversation, but it switches to its own system prompt and toolset. This matters because it enables cascaded delegation: a triage agent can route to a sales agent or a refunds agent, or an airline triage agent can route to flight modification, then to flight change/cancel, without forcing one agent to carry every tool and instruction.

Why does the transcript claim Swarm is “lightweight,” and what are the consequences?

Swarm is described as lightweight because it focuses on conversation memory rather than a richer state system. That makes flows easier to inspect and keeps orchestration simpler than frameworks that manage more explicit state, reflection steps, and memory behaviors. The consequence is that long-running interactions can be more prone to loops, and developers may need to add their own safeguards and state handling compared with more stateful frameworks like LangGraph.

How do context variables and tool calling fit into Swarm’s agent design?

Context variables are injected into prompts at runtime (similar in spirit to RAG context injection), letting the agent tailor responses using user-specific data like a name or user ID. Tool calling then lets the agent perform actions: for example, a weather agent uses a weather tool when asked “What is the weather in Paris?”; a refund flow uses tools that require structured inputs like item IDs. In the examples, tool outputs are returned to the agent, which then produces the final user-facing response.

What’s the practical lesson from the custom DuckDuckGo search agent about tool outputs?

The search agent demonstrates that tool outputs often need a second “reasoner” step to rephrase results into natural language. If instructions are too minimal, the assistant may simply echo raw tool output instead of converting it into a coherent answer. The transcript highlights this as a common pitfall: retrieval should feed a reasoning/rephrasing stage, not replace it.

How do the examples enforce agent-specific behavior after a handoff?

Agent-specific behavior is enforced through each agent’s system prompt and toolset. For instance, the cat agent requires responses to end with “Meow meow Meow,” and the language-specific agents only respond in their target language. When the English agent hands off to the Spanish agent, the conversation continues but the response language and constraints change according to the receiving agent’s instructions.

Review Questions

In Swarm, how do routines and handoffs work together to create a multi-step workflow without one agent holding all tools?
What risks arise from Swarm’s lighter state/memory approach, and how might a developer mitigate loop behavior?
Why is it important for an agent to rephrase tool outputs rather than returning raw JSON or tool text directly?

Key Points

1
Swarm’s architecture is built around routines (structured instruction lists) and handoffs (conversation transfers to specialized agents).
2
Routines turn natural-language steps into an agent’s system prompt, guiding both behavior and tool usage across multiple turns.
3
Handoffs preserve conversation context while switching to a new agent’s system prompt and toolset, enabling cascaded delegation flows.
4
Swarm is currently oriented toward OpenAI models; adapting it to other model runtimes like Ollama may work but can be unreliable.
5
Swarm relies more on conversation memory than a full state system, which can simplify debugging but increases loop risk for complex workflows.
6
Tool calling and context-variable injection let agents perform actions (weather, email, refunds, flight changes) and personalize responses using runtime data.
7
A separate reasoning/rephrasing step is important so the assistant turns tool outputs into user-friendly answers instead of echoing raw results.

Highlights

Swarm’s signature mechanism is handoff: an agent can transfer the conversation to another agent that has different instructions and tools, without losing prior context.

Routines act like step-by-step playbooks—numbered instructions that become the system prompt—so the assistant follows a predictable sequence rather than free-form prompting.

The framework’s lightweight design trades away robust state/memory features, making it easier to understand but potentially harder to control in long-running, loop-prone scenarios.

In the custom search example, tool outputs must be rephrased by a reasoning layer; otherwise the assistant may just return raw retrieval results. 

Topics

Swarm Framework
Multi-Agent Handoffs
Agent Routines
Tool Calling
Context Injection

Mentioned

Sam Witteveen