Build Hour: Codex
Based on OpenAI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Codex is being unified into one agent experience across IDE extensions, the Codex CLI, GitHub code review, and asynchronous cloud task execution.
Briefing
Codex is being reshaped into a single “agent everywhere you code” experience—spanning VS Code-style IDEs, the terminal, GitHub code review, and cloud task handoffs—so developers can delegate engineering work without stitching together separate tools. The practical shift matters because it changes how teams can run software changes: instead of treating AI as a chat assistant, Codex now supports multi-step workflows that modify code locally, execute work in a secure cloud sandbox, and return pull requests that can be merged or iterated on.
OpenAI’s rollout ties together several moving parts. Codex started with a lightweight open-source Codex CLI (usable via chat GPT or API keys) and then added Codex and chat GPT, an asynchronous cloud agent that runs code remotely and produces PRs. Those earlier experiences were powerful but siloed. Last week’s release aims to unify them by adding a new IDE extension that works in VS Code, Cursor, and VS Code-compatible forks, bringing CLI-like functionality directly into the editor. In parallel, the Codex CLI is being improved rapidly (open-source releases, UI and reliability upgrades) and now includes features like GitHub code review for new PRs.
A key capability introduced in the demos is task delegation across environments. Codex can work locally—effectively “pair programming” by executing code and modifying the developer’s machine—or remotely in a secure cloud sandbox for asynchronous work that returns PRs. The cloud workflow can also pull down results into the existing local repository state, letting developers test, apply diffs, and iterate without committing immediately.
The event’s walkthrough uses the open-source agents SDK TypeScript monorepo as a live target. Codex first answers repository questions by traversing the codebase in chat mode, then switches into agent mode to make changes with an approval step. The demo highlights how repository structure affects performance: a monorepo with clearly named packages helps Codex navigate and compare example implementations. It also shows Codex handling a nontrivial real-time API update, including adding continuous image input at one frame per second, and then launching additional cloud tasks to update other demo apps.
To reduce the risk of “one-shot” prompt brittleness, Codex supports running multiple attempts in parallel. In the demo, four cloud attempts generate multiple PR candidates, and the developer selects the best result—an approach framed as saving time otherwise spent on prompt engineering. The workflow also leans on a mental model shift: developers should think more like engineering managers or architects, structuring work into parallelizable tasks, kicking off background jobs when bugs or follow-ups appear, and returning later to pull down and refine PRs.
Finally, GitHub code review is presented as more than static analysis. Codex reviews PRs by validating intent against diffs and running code in its own environment when needed, then posts review feedback. The walkthrough shows a teammate opening a PR and receiving an edge-case catch that would likely have been missed, followed by a follow-up task to incorporate fixes.
The takeaway is a workflow playbook: structure repos for navigation, document conventions with agents.md, let Codex write and run tests (including compiling TypeScript examples and using existing test suites), plan complex changes in markdown, and trigger tasks as inspiration hits—sometimes from mobile—so engineering momentum doesn’t stall while waiting to get back to a computer.
Cornell Notes
Codex is evolving into one unified engineering agent across IDEs, the terminal, GitHub, and cloud execution. Instead of siloed experiences, it now supports local changes, secure cloud sandbox runs that return PRs, and automatic GitHub code review that can validate intent by running code. The demos show how Codex navigates a TypeScript monorepo, updates real-time API demos, and delegates follow-on work to the cloud while local work continues. Running multiple parallel attempts helps avoid fragile “perfect prompt” dependency, and developers can later apply diffs or pull down results to iterate. The practical value is a workflow shift: treat Codex like an orchestrated engineering teammate—plan, test, review, and delegate—rather than a single chat session.
What does “Codex everywhere you code” mean in practice, and how do the interfaces differ?
How does Codex decide where work runs—locally or in the cloud—and why does that matter?
Why does repository structure—like monorepos and named packages—affect Codex performance?
What is the role of “agent mode” versus “chat mode,” and what safety tradeoff appears in the demo?
How does Codex reduce the risk of a bad one-shot change when the task is complex?
What makes Codex code review different from basic static analysis?
Review Questions
- How would you structure a repository (packages, naming, and documentation) to make Codex’s code navigation and parallel tasking more reliable?
- In a workflow with both local and cloud tasks, what steps would you take to keep changes testable without committing too early?
- When should you use parallel attempts versus planning with agents.md/plan markdown files for complex features?
Key Points
- 1
Codex is being unified into one agent experience across IDE extensions, the Codex CLI, GitHub code review, and asynchronous cloud task execution.
- 2
Codex can run work locally (pair-programming style changes) or remotely in a secure cloud sandbox (asynchronous PR generation), and results can be pulled back into the local repo state.
- 3
The IDE extension works in VS Code, Cursor, and VS Code-compatible forks, bringing CLI-like capabilities directly into the editor.
- 4
Agent mode supports change-making with an approval step, while chat mode is read-only; full-access modes can write outside the workspace and should be used carefully.
- 5
Running multiple parallel attempts helps avoid fragile “perfect prompt” dependency by producing several PR candidates for selection.
- 6
Codex code review can validate PR intent by running code in its own environment, not just performing static checks.
- 7
Using agents.md and planning complex changes in markdown improves Codex’s ability to follow conventions and coordinate multi-step work with higher success rates.