Get AI summaries of any video or article — Sign up free
Getting started with Codex thumbnail

Getting started with Codex

OpenAI·
6 min read

Based on OpenAI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Codex is positioned as a coding agent that delegates routine work (like code review and refactors) so developers can focus on higher-leverage design and architecture tasks.

Briefing

Codex is OpenAI’s coding agent that developers can delegate routine, time-consuming work to—freeing them to focus on design, architecture, and other novel engineering decisions. Teams are using it for practical workflows like code review before pull requests merge, Slack-thread-driven changes that produce new PRs, and custom integrations via an SDK that can run Codex inside their own containers. Codex also supports parallel background work through cloud environments, enabling async tasks such as code review while developers are away.

A key theme throughout the onboarding is that Codex performance depends on context management and guardrails. Codex sessions don’t retain project knowledge between runs, so agents.md acts like a lightweight, repo-specific “readme” that gets automatically loaded whenever Codex starts in a directory. It provides the agent with a project overview, structure, build and test commands, useful CLI tools, and workflow guidance for implementing features end-to-end. For larger efforts, teams can keep agents.md brief (often under 100 lines) and link out to task-specific markdown files—like plans.md templates that let Codex generate and maintain living execution plans for multi-hour refactors.

Getting started centers on installation and authentication. Codex CLI can be installed via Homebrew (brew) or npm, with npm recommended for staying current because releases ship quickly. The VS Code extension is installed by searching for “OpenAI Codex” and enabling auto-update. Access uses a Chatbot enterprise account: the CLI supports sign-in via “Codex login,” typically routing through internal SSO. Once authenticated, “/status” shows the active model, directory, sandbox mode, approval policy, and remaining context window.

Customization happens through config.toml, which controls defaults such as the model, reasoning effort, sandbox mode, approval policy, and optional features like web search. By default, approval is set to request escalated permissions only when needed, while sandbox mode limits file writes to the current workspace. Prompting best practices emphasize anchoring the agent to the right files (using file mentions), starting with small tasks, and including verification steps such as running tests or linters. For debugging, pasting full stack traces helps Codex navigate directly to the source of errors.

On the tooling side, Codex works across surfaces: a terminal-focused CLI for lightweight interactions and a VS Code-based IDE extension for richer navigation and workflows. Developers can bind keyboard shortcuts to add context, use IDE “to-dos” to queue tasks inside code, and even drive changes from images—like screenshot-based instructions to modify UI elements. Sessions can be resumed with “Codex resume,” preserving prior context as a mini project container.

The advanced portion highlights programmatic and enterprise-ready patterns. Codex CLI supports headless execution (“Codex exec”) with structured JSON output that can be parsed and fed into CI/CD pipelines for tasks like code-quality scoring, security triage, and automated refactor checks. For tool augmentation, Codex integrates with MCP servers over standard IO or HTTP—examples include Jira, Linear, Figma, Datadog, and a demo “Cupcake MCP.” It can also pull up-to-date framework documentation via context 7, reducing issues caused by documentation drift.

Finally, Codex includes built-in code review flows (CLI and IDE) that focus on high-severity findings (P0/P1) to avoid noisy feedback. For teams that want deeper automation, options extend to on-prem code review, CI autofix loops, and issue auto-triage by intent-based labeling. The session closes with pointers to developers.openai.com/codex, cookbooks, change logs, and admin/security/rate-card resources for enterprise planning and governance.

Cornell Notes

Codex is OpenAI’s coding agent designed to take on routine engineering tasks—like code review, refactors, and documentation—so developers can spend more time on complex design work. Because coding agents don’t carry context across sessions, agents.md is the core mechanism for loading project-specific instructions automatically, including build/test commands and workflow guidance. Teams can keep agents.md short and link to task-specific docs (like plans.md) so Codex can manage multi-hour refactors with a living plan. Prompting quality matters: anchor Codex to relevant files, start with small tasks, and include verification steps such as running tests or linters. For scale and automation, Codex supports headless execution with structured JSON output and integrates with MCP servers (e.g., Jira/Linear/Figma) to pull external context and take actions.

Why does agents.md matter so much for Codex results?

Codex doesn’t retain project context between sessions, so each new run starts with a fresh context window. agents.md acts like a lightweight, repo-specific readme that Codex automatically loads when it starts in a directory containing that file. It typically includes a project overview and structure, pointers to key files, build and test commands, useful CLI tools, and a workflow for implementing features end-to-end. For larger work, teams can reference additional markdown files (e.g., plans.md) so Codex can do progressive discovery without bloating agents.md.

How can teams keep agents.md effective without overwhelming the agent?

The recommended practice is to keep agents.md brief and focused—often under 100 lines—because too many instructions can confuse the agent or create conflicting guidance that wastes time resolving ambiguity. A second tactic is to unlock agentic loops: add commands and verification steps (like running tests or linters) so Codex can check its work and iterate faster. Finally, teams should update agents.md over time as Codex encounters mistakes or “gotchas,” adding the missing commands or references so future runs improve.

What’s the role of config.toml in controlling Codex behavior?

config.toml customizes defaults for Codex CLI sessions, including the default model, reasoning effort, sandbox mode, approval policy, and feature toggles like web search. The session defaults described include approval mode set to request escalated permissions only when needed, and sandbox mode set to “workspace write,” meaning Codex writes only within the current directory rather than outside it. Teams can also define profiles (e.g., a “fast” profile) to switch between configurations quickly.

What prompting habits improve Codex reliability during coding and debugging?

First, anchor prompts by pointing Codex to specific files or starting points using file mentions, so it doesn’t wander into irrelevant parts of the codebase. Second, begin with small tasks to validate behavior before scaling up. Third, include verification steps—tests, linters, or explicit completion requirements—so progress can be measured. For debugging, pasting the full stack trace gives Codex the detail needed to locate the error source. Codex can also break larger tasks into smaller units by researching the codebase first.

How do MCP servers extend Codex beyond local code?

MCP (Model Context Protocol) connects Codex to external tools and context. Codex supports MCP over standard IO and HTTP. Common MCP servers mentioned include Figma (front-end designs from mockups), Jira and Linear (ticket context and updates), context 7 (up-to-date framework documentation), and Datadog (production diagnostics). A demo “Cupcake MCP” shows how Codex can call an MCP server to fetch data (e.g., Rachel’s cupcake order) and then write the result into agents.md. This pattern generalizes to pulling logs, tickets, or other live context for coding tasks.

What makes Codex useful for automation in CI/CD pipelines?

Codex CLI can run headlessly via “Codex exec” and output results in a structured JSON format that follows an OpenAI structured output schema. In the example, Codex streams logs and then prints valid JSON containing fields like total files analyzed, total issues, a 0–100 score, and per-issue details (severity, description, citations, line numbers). That structured output can be parsed with tools like jq and used to trigger downstream actions—such as security triage, test coverage bots, refactor automation, or release hygiene steps.

Review Questions

  1. What should an agents.md file include to help Codex implement features end-to-end, and why is keeping it under ~100 lines recommended?
  2. How do sandbox mode and approval policy work together to control what Codex can change during a session?
  3. Describe two ways Codex can be integrated into a team workflow using either MCP servers or code review commands.

Key Points

  1. 1

    Codex is positioned as a coding agent that delegates routine work (like code review and refactors) so developers can focus on higher-leverage design and architecture tasks.

  2. 2

    agents.md is the primary context mechanism because Codex doesn’t retain project knowledge across sessions; it loads automatically when present in the working directory.

  3. 3

    Keep agents.md brief and focused, and link to task-specific markdown (like plans.md) for multi-step or multi-hour work so Codex can progressively discover details.

  4. 4

    Use config.toml to set safe defaults (workspace-limited sandbox writes and approval-on-escalation) and to control features like web search and reasoning effort.

  5. 5

    Prompting works best when Codex is anchored to relevant files, tasks start small, and prompts include verification steps such as tests and linters.

  6. 6

    Codex extends beyond local code through MCP servers (e.g., Jira/Linear/Figma/context 7/Datadog), letting it pull live external context and act on it.

  7. 7

    For automation, Codex CLI supports headless execution with structured JSON output that can be parsed and wired into CI/CD pipelines.

Highlights

agents.md functions as a lightweight, repo-specific readme that Codex loads automatically, compensating for the lack of cross-session memory.
config.toml defaults emphasize safety: approval requests only when escalated permissions are needed, and sandbox mode restricts writes to the current workspace.
Codex can generate and maintain living plans for large refactors by using plans.md templates referenced from agents.md.
MCP integration lets Codex fetch real-time external context (like tickets or documentation) and then write results back into the codebase.
Headless “Codex exec” can emit structured JSON for CI/CD use cases such as code-quality scoring and issue triage.

Topics

  • Codex Onboarding
  • agents.md
  • config.toml
  • Prompting Best Practices
  • MCP Integration
  • Code Review Automation
  • Headless Structured Output

Mentioned