OpenAI DevDay 2024 | Community Spotlight | Swyx
Based on OpenAI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Treat an AI agent as a stack: LLMs plus memory plus planning plus tools, and build each layer with a clear engineering purpose.
Briefing
AI agents are best built by treating them as a stack—LLMs plus memory plus planning plus tools—then mapping those capabilities to concrete engineering jobs as model abilities advance. The core message is that developers don’t need to be PhD-level researchers to ship state-of-the-art agents; they need a clear mental model of what an agent requires and where new model capabilities create new opportunities.
The talk frames “agents” with a practical definition attributed to Lilan Wang: agents are “LLM + memory + planning + tools.” The presenter then refines that idea into an engineering checklist. First comes the infrastructure layer: teams need a gateway and operations tooling, plus retrieval-augmented generation (RAG) frameworks to connect models to data. Open-source tools are highlighted as the default starting point, and the talk points attendees toward Singapore-based work, including Feedist AI, as evidence that local builders are already contributing to the ecosystem.
Second is memory and knowledge. ChatGPT-style systems can be improved dramatically when memory is added, and the talk treats memory as a core agent capability rather than an optional feature. For knowledge storage and retrieval, vector databases remain common, but knowledge graphs are presented as an increasingly interesting direction. A standout example is “Graph RAG,” which was described as the most surprising popular talk at the AI Engineer Conference the presenter runs.
Third is planning and multi-agent behavior—described as the most active research area. The talk recommends reading OpenAI’s “Let’s Verify Step-by-Step” work for process-style reasoning, and also points to the SW F project as another reference point. Multi-agent systems are positioned as a straightforward path to better performance, and the presenter promises a demo showing how splitting tasks across agents improves outcomes.
Finally, tools and orchestration tie the stack together. The talk argues that many agent systems converge on the same building blocks: a code interpreter/sandbox for deterministic computation, browser control for searching and reading from the web, and a self-ask or ReAct-style loop that cycles through observing and reacting to the environment. These components are presented as the practical substrate behind agent frameworks and products, including Microsoft’s “magentic” and “cognition,” which the presenter says aligns closely with OpenAI’s architecture.
The demo turns the theory into a workflow. Starting from a fork of Bolt.new, the presenter generates a Space Invaders-style game from a short prompt, then adds a second agent to iteratively improve gameplay: aliens arrive in waves, power-ups trigger on kills, stars appear in the background, and visuals become more “alien” using emoji styling. The second agent effectively performs QA and feature requests in the background, demonstrating how planning and multi-agent collaboration can transform a rough prototype into a more engaging experience. The closing takeaway is a call to action for Singapore: use foundation models to build an “AI engineering nation,” leveraging the agent stack as a roadmap for what to build next as capabilities keep improving.
Cornell Notes
The talk presents AI agents as a layered engineering stack: LLMs plus memory plus planning plus tools. It recommends building agents by mapping each layer to specific jobs—gateway/ops/RAG for infrastructure, memory and knowledge stores (including vector DBs and knowledge graphs) for retrieval, planning and multi-agent approaches for better performance, and tool orchestration (code execution, browser control, and ReAct/self-ask loops) for real-world interaction. A live demo shows how adding a second agent to handle QA and feature requests upgrades a generated Space Invaders game into a more polished, engaging experience. The practical takeaway: developers can build advanced agents without deep research training by using a mental map of agent capabilities and updating it as model capabilities improve.
What does “LLM + memory + planning + tools” mean in engineering terms?
Why does the talk treat memory as more than a nice-to-have?
How do knowledge graphs fit alongside vector databases?
What’s the role of planning and multi-agent design?
Which tool/orchestration primitives recur across agent systems?
How did the demo demonstrate multi-agent improvement?
Review Questions
- If you had to design an agent from scratch, what concrete components would you assign to the “infrastructure,” “memory/knowledge,” “planning/multi-agent,” and “tools/orchestration” layers?
- What tradeoffs might lead a team to choose knowledge graphs (Graph RAG) over vector databases for retrieval?
- In the demo workflow, what specific improvements were requested by the second agent, and how do those requests map to planning and tool use?
Key Points
- 1
Treat an AI agent as a stack: LLMs plus memory plus planning plus tools, and build each layer with a clear engineering purpose.
- 2
Start with infrastructure primitives like a gateway/ops tooling and RAG frameworks to connect models to data reliably.
- 3
Make memory a first-class capability, not an afterthought, and study implementation approaches such as the M GPT paper.
- 4
Use knowledge graphs (Graph RAG) as a serious alternative to vector databases when the problem benefits from structured relationships.
- 5
Improve performance by adding planning and multi-agent behavior, using references like “Let’s Verify Step-by-Step” and SW F as starting points.
- 6
Standardize on tool/orchestration building blocks: sandbox code execution, browser control, and ReAct/self-ask loops.
- 7
Adopt a “capabilities map” mindset: when model capabilities advance, it creates new opportunities to upgrade agent performance and user experience.