Build Hour: Codex

TL;DR

Codex is being unified into one agent experience across IDE extensions, the Codex CLI, GitHub code review, and asynchronous cloud task execution.

Briefing Cornell Notes

Briefing

Codex is being reshaped into a single “agent everywhere you code” experience—spanning VS Code-style IDEs, the terminal, GitHub code review, and cloud task handoffs—so developers can delegate engineering work without stitching together separate tools. The practical shift matters because it changes how teams can run software changes: instead of treating AI as a chat assistant, Codex now supports multi-step workflows that modify code locally, execute work in a secure cloud sandbox, and return pull requests that can be merged or iterated on.

OpenAI’s rollout ties together several moving parts. Codex started with a lightweight open-source Codex CLI (usable via chat GPT or API keys) and then added Codex and chat GPT, an asynchronous cloud agent that runs code remotely and produces PRs. Those earlier experiences were powerful but siloed. Last week’s release aims to unify them by adding a new IDE extension that works in VS Code, Cursor, and VS Code-compatible forks, bringing CLI-like functionality directly into the editor. In parallel, the Codex CLI is being improved rapidly (open-source releases, UI and reliability upgrades) and now includes features like GitHub code review for new PRs.

A key capability introduced in the demos is task delegation across environments. Codex can work locally—effectively “pair programming” by executing code and modifying the developer’s machine—or remotely in a secure cloud sandbox for asynchronous work that returns PRs. The cloud workflow can also pull down results into the existing local repository state, letting developers test, apply diffs, and iterate without committing immediately.

The event’s walkthrough uses the open-source agents SDK TypeScript monorepo as a live target. Codex first answers repository questions by traversing the codebase in chat mode, then switches into agent mode to make changes with an approval step. The demo highlights how repository structure affects performance: a monorepo with clearly named packages helps Codex navigate and compare example implementations. It also shows Codex handling a nontrivial real-time API update, including adding continuous image input at one frame per second, and then launching additional cloud tasks to update other demo apps.

To reduce the risk of “one-shot” prompt brittleness, Codex supports running multiple attempts in parallel. In the demo, four cloud attempts generate multiple PR candidates, and the developer selects the best result—an approach framed as saving time otherwise spent on prompt engineering. The workflow also leans on a mental model shift: developers should think more like engineering managers or architects, structuring work into parallelizable tasks, kicking off background jobs when bugs or follow-ups appear, and returning later to pull down and refine PRs.

Finally, GitHub code review is presented as more than static analysis. Codex reviews PRs by validating intent against diffs and running code in its own environment when needed, then posts review feedback. The walkthrough shows a teammate opening a PR and receiving an edge-case catch that would likely have been missed, followed by a follow-up task to incorporate fixes.

The takeaway is a workflow playbook: structure repos for navigation, document conventions with agents.md, let Codex write and run tests (including compiling TypeScript examples and using existing test suites), plan complex changes in markdown, and trigger tasks as inspiration hits—sometimes from mobile—so engineering momentum doesn’t stall while waiting to get back to a computer.

Cornell Notes

Codex is evolving into one unified engineering agent across IDEs, the terminal, GitHub, and cloud execution. Instead of siloed experiences, it now supports local changes, secure cloud sandbox runs that return PRs, and automatic GitHub code review that can validate intent by running code. The demos show how Codex navigates a TypeScript monorepo, updates real-time API demos, and delegates follow-on work to the cloud while local work continues. Running multiple parallel attempts helps avoid fragile “perfect prompt” dependency, and developers can later apply diffs or pull down results to iterate. The practical value is a workflow shift: treat Codex like an orchestrated engineering teammate—plan, test, review, and delegate—rather than a single chat session.

What does “Codex everywhere you code” mean in practice, and how do the interfaces differ?

Codex is available through multiple surfaces that share the same underlying workflow: an IDE extension (VS Code, Cursor, and VS Code-compatible forks), the Codex CLI in the terminal, GitHub-based code review, and asynchronous task execution via web or the chat GPT iOS app. The IDE extension brings CLI-like actions into the editor so developers can work alongside their code. The CLI is oriented toward terminal-first workflows. GitHub code review attaches automated reviews to new PRs. Web/iOS supports scheduling or kicking off tasks asynchronously and reviewing later.

How does Codex decide where work runs—locally or in the cloud—and why does that matter?

Codex can execute work locally on the developer’s machine (useful for pair-programming style changes, environment modifications, and quick iteration) or remotely in a secure cloud sandbox (useful for asynchronous tasks that run on their own machine and return PRs). The demo shows local agent mode updating a UI component, while additional cloud tasks run in parallel to update other demo apps. This separation reduces blocking and lets developers test or apply results when ready.

Why does repository structure—like monorepos and named packages—affect Codex performance?

Codex navigates code by traversing the repository to find relevant context. In the agents SDK demo, the monorepo with clearly named projects/packages helps Codex better navigate and understand how parts relate or remain separate. That structure also supports parallel work: smaller, well-defined projects reduce merge conflicts and make it easier for multiple tasks to land cleanly.

What is the role of “agent mode” versus “chat mode,” and what safety tradeoff appears in the demo?

Chat mode is described as read-only: it answers questions using local context without making changes. Agent mode is interactive: it decides what changes to make, asks for approval, and then applies them. A third mode (“YOLO mode”) is full access and can write outside the workspace, which can be useful but carries higher risk. The demo recommends sticking with agent mode unless the developer understands the consequences.

How does Codex reduce the risk of a bad one-shot change when the task is complex?

The demo uses parallel attempts (described as “best defend” internally) to generate multiple PR candidates simultaneously. Instead of spending time on prompt engineering to get the exact definition right, the developer fires off four attempts and selects the best result. If the chosen PR still needs refinement, another best-of-end iteration can proceed from that candidate.

What makes Codex code review different from basic static analysis?

Codex code review is presented as intent-aware and execution-aware. It compares the PR diff and the codebase context to determine whether changes match the PR’s intent, and it can run code in its own environment to validate behavior—not just scan for patterns. The demo shows Codex catching an edge case that a human reviewer might miss, then enabling follow-up fixes via additional Codex tasks.

Review Questions

How would you structure a repository (packages, naming, and documentation) to make Codex’s code navigation and parallel tasking more reliable?
In a workflow with both local and cloud tasks, what steps would you take to keep changes testable without committing too early?
When should you use parallel attempts versus planning with agents.md/plan markdown files for complex features?

Key Points

1
Codex is being unified into one agent experience across IDE extensions, the Codex CLI, GitHub code review, and asynchronous cloud task execution.
2
Codex can run work locally (pair-programming style changes) or remotely in a secure cloud sandbox (asynchronous PR generation), and results can be pulled back into the local repo state.
3
The IDE extension works in VS Code, Cursor, and VS Code-compatible forks, bringing CLI-like capabilities directly into the editor.
4
Agent mode supports change-making with an approval step, while chat mode is read-only; full-access modes can write outside the workspace and should be used carefully.
5
Running multiple parallel attempts helps avoid fragile “perfect prompt” dependency by producing several PR candidates for selection.
6
Codex code review can validate PR intent by running code in its own environment, not just performing static checks.
7
Using agents.md and planning complex changes in markdown improves Codex’s ability to follow conventions and coordinate multi-step work with higher success rates.

Highlights

Codex is positioned as one agent across IDE, terminal, GitHub, and cloud—so engineering work can move from question to PR without switching tools midstream.

Parallel attempts can generate multiple PR candidates at once, letting developers pick the best outcome instead of perfecting prompts.

GitHub code review is described as execution-aware: Codex can run code to confirm that diffs match the PR’s intent.

The agents SDK demo ties performance to repo design: a monorepo with clearly named packages helps Codex traverse and compare implementations effectively.

Topics

Codex Agent Workflow
IDE Extension
Cloud Sandbox Tasks
GitHub Code Review
agents.md Planning

Mentioned

Pranadesh Mande
Dominic Kundal
PR
VS Code
MCP
iOS
GPT5
CLI
API
SDK