OpenAI DevDay 2024 | Community Spotlight

TL;DR

Treat an AI agent as a stack: LLMs plus memory plus planning plus tools, and build each layer with a clear engineering purpose.

Briefing Cornell Notes

Briefing

AI agents are best built by treating them as a stack—LLMs plus memory plus planning plus tools—then mapping those capabilities to concrete engineering jobs as model abilities advance. The core message is that developers don’t need to be PhD-level researchers to ship state-of-the-art agents; they need a clear mental model of what an agent requires and where new model capabilities create new opportunities.

The talk frames “agents” with a practical definition attributed to Lilan Wang: agents are “LLM + memory + planning + tools.” The presenter then refines that idea into an engineering checklist. First comes the infrastructure layer: teams need a gateway and operations tooling, plus retrieval-augmented generation (RAG) frameworks to connect models to data. Open-source tools are highlighted as the default starting point, and the talk points attendees toward Singapore-based work, including Feedist AI, as evidence that local builders are already contributing to the ecosystem.

Second is memory and knowledge. ChatGPT-style systems can be improved dramatically when memory is added, and the talk treats memory as a core agent capability rather than an optional feature. For knowledge storage and retrieval, vector databases remain common, but knowledge graphs are presented as an increasingly interesting direction. A standout example is “Graph RAG,” which was described as the most surprising popular talk at the AI Engineer Conference the presenter runs.

Third is planning and multi-agent behavior—described as the most active research area. The talk recommends reading OpenAI’s “Let’s Verify Step-by-Step” work for process-style reasoning, and also points to the SW F project as another reference point. Multi-agent systems are positioned as a straightforward path to better performance, and the presenter promises a demo showing how splitting tasks across agents improves outcomes.

Finally, tools and orchestration tie the stack together. The talk argues that many agent systems converge on the same building blocks: a code interpreter/sandbox for deterministic computation, browser control for searching and reading from the web, and a self-ask or ReAct-style loop that cycles through observing and reacting to the environment. These components are presented as the practical substrate behind agent frameworks and products, including Microsoft’s “magentic” and “cognition,” which the presenter says aligns closely with OpenAI’s architecture.

The demo turns the theory into a workflow. Starting from a fork of Bolt.new, the presenter generates a Space Invaders-style game from a short prompt, then adds a second agent to iteratively improve gameplay: aliens arrive in waves, power-ups trigger on kills, stars appear in the background, and visuals become more “alien” using emoji styling. The second agent effectively performs QA and feature requests in the background, demonstrating how planning and multi-agent collaboration can transform a rough prototype into a more engaging experience. The closing takeaway is a call to action for Singapore: use foundation models to build an “AI engineering nation,” leveraging the agent stack as a roadmap for what to build next as capabilities keep improving.

Cornell Notes

The talk presents AI agents as a layered engineering stack: LLMs plus memory plus planning plus tools. It recommends building agents by mapping each layer to specific jobs—gateway/ops/RAG for infrastructure, memory and knowledge stores (including vector DBs and knowledge graphs) for retrieval, planning and multi-agent approaches for better performance, and tool orchestration (code execution, browser control, and ReAct/self-ask loops) for real-world interaction. A live demo shows how adding a second agent to handle QA and feature requests upgrades a generated Space Invaders game into a more polished, engaging experience. The practical takeaway: developers can build advanced agents without deep research training by using a mental map of agent capabilities and updating it as model capabilities improve.

What does “LLM + memory + planning + tools” mean in engineering terms?

It’s a blueprint for what must exist in an agent system. LLMs provide language and reasoning; memory and knowledge store what the agent should remember and retrieve; planning coordinates multi-step work (often via multi-agent setups); tools enable actions the model can’t do alone—like running code in a sandbox, browsing/searching the web, and iterating through observe→reason→act loops.

Why does the talk treat memory as more than a nice-to-have?

Memory is framed as a core agent capability that improves how an agent behaves over time. The talk contrasts “ChatGPT with memory” as a meaningful upgrade and points to the “M GPT” paper as a recommended read for implementing memory in agent systems.

How do knowledge graphs fit alongside vector databases?

Vector databases are described as well-known, but knowledge graphs are highlighted as an increasingly popular alternative for knowledge representation. The talk specifically calls out “Graph RAG” as a notable approach, suggesting that graph-structured knowledge can change how retrieval and reasoning work inside agents.

What’s the role of planning and multi-agent design?

Planning is presented as the most active research area and a major lever for performance. Multi-agent setups are described as a simple way to improve results by splitting tasks across specialized agents. The talk recommends OpenAI references like “Let’s Verify Step-by-Step” and also points to the SW F project as additional material.

Which tool/orchestration primitives recur across agent systems?

The talk groups tools into three recurring primitives: (1) a code interpreter/sandbox (e.g., e2b) for deterministic computation, (2) browser control for searching and reading external information, and (3) a self-ask or ReAct-style loop that cycles through observing and reacting to the environment. It links these to common agent frameworks and mentions Microsoft’s “magentic” and “cognition” as examples to check out.

How did the demo demonstrate multi-agent improvement?

The demo starts with Bolt.new generating a basic Space Invaders game from a prompt. Then a second agent is added—using voice—to request and implement gameplay upgrades: aliens should move in waves, power-ups should appear when aliens die, stars should be added for atmosphere, and visuals should look more alien (including emoji styling). The second agent effectively performs QA and feature iteration, producing a more engaging game than the initial single-agent output.

Review Questions

If you had to design an agent from scratch, what concrete components would you assign to the “infrastructure,” “memory/knowledge,” “planning/multi-agent,” and “tools/orchestration” layers?
What tradeoffs might lead a team to choose knowledge graphs (Graph RAG) over vector databases for retrieval?
In the demo workflow, what specific improvements were requested by the second agent, and how do those requests map to planning and tool use?

Key Points

1
Treat an AI agent as a stack: LLMs plus memory plus planning plus tools, and build each layer with a clear engineering purpose.
2
Start with infrastructure primitives like a gateway/ops tooling and RAG frameworks to connect models to data reliably.
3
Make memory a first-class capability, not an afterthought, and study implementation approaches such as the M GPT paper.
4
Use knowledge graphs (Graph RAG) as a serious alternative to vector databases when the problem benefits from structured relationships.
5
Improve performance by adding planning and multi-agent behavior, using references like “Let’s Verify Step-by-Step” and SW F as starting points.
6
Standardize on tool/orchestration building blocks: sandbox code execution, browser control, and ReAct/self-ask loops.
7
Adopt a “capabilities map” mindset: when model capabilities advance, it creates new opportunities to upgrade agent performance and user experience.

Highlights

Agents are framed as an engineering stack—LLM + memory + planning + tools—where each layer maps to specific build jobs.

Graph RAG is singled out as a standout direction, suggesting knowledge graphs can meaningfully change retrieval and reasoning.

The demo shows multi-agent QA iteration: a second agent turns a basic generated Space Invaders prototype into a more engaging game by adding waves, power-ups, and visual upgrades.

The recurring agent primitives are concrete: sandbox code execution, browser control, and ReAct/self-ask loops.

Topics

AI Engineering
Agent Stack
Memory and Knowledge
Multi-Agent Planning
Tool Orchestration

OpenAI DevDay 2024 | Community Spotlight | Swyx