Automatic code reviews with OpenAI Codex

TL;DR

Enable Codex in repository settings to automatically review pull requests as part of normal GitHub workflows.

Briefing Cornell Notes

Briefing

OpenAI Codex is rolling out automatic code review that can run in both GitHub workflows and the local terminal, aiming to catch real bugs without flooding teams with low-value comments. The core idea is simple: enable Codex in a repository, and every pull request can be automatically reviewed—either immediately or on demand—so human reviewers spend less time on routine verification and more time on higher-stakes judgment.

The workflow is designed to fit existing team habits. Codex can be turned on in Codex web settings so PRs marked “ready for review” get picked up automatically. Teams can also control timing by triggering reviews while a PR is still in draft, using a comment command like “Codex review this PR.” That lets developers get early feedback without exposing work to the full team.

A key capability is that Codex code review isn’t limited to scanning diffs. The model can access the entire repository, follow dependencies, and reason about changes in the context of a larger codebase—an important distinction for complex projects where contributors don’t fully understand every subsystem. When it finds a potential issue, it can go further by forming hypotheses, writing and running Python code to test those hypotheses, and validating findings with concrete examples rather than relying on surface-level static analysis.

Training focuses on bug-finding that people would actually fix. Codex code review models were trained with specific tasks that prioritize catching bugs with practical impact, and evaluations emphasize high precision—particularly reducing incorrect comments compared with previous generations. Still, the ultimate test is real-world usage: finding genuine issues while staying quiet enough not to annoy developers.

Internally at OpenAI, Codex has already been used to prevent critical problems, including bugs that could delay important training runs and configuration issues that wouldn’t be obvious from a diff alone. It also supports safer contributions to unfamiliar code: for example, a Codex review flagged an implementation mistake in a VS Code extension workflow, and a follow-up interaction allowed Codex to take over and fix the issue after the initial finding.

Beyond default behavior, Codex is described as steerable through custom instructions. Teams can add code review guidelines—what to pay special attention to, what to ignore, and even response style—either via user instructions or by using agents.md files within the codebase. That makes the review process adaptable to different engineering standards.

Finally, Codex code review is expanding beyond the cloud. A new “review in the Codex CLI” feature enables terminal-based review of local changes before they become a GitHub PR, with the model able to execute more checks locally. The stated goal is to improve this feature over the coming weeks so teams can catch bugs earlier, reduce production risk, and accelerate shipping with less manual verification burden.

Cornell Notes

OpenAI Codex is being positioned as an always-on code reviewer that fits into existing engineering workflows. After enabling it for a repository, Codex can automatically review pull requests on GitHub, and it can also review changes locally through the Codex CLI. Unlike diff-only tools, Codex can inspect the full repository, trace dependencies, and validate findings by writing and running code to test hypotheses. Training emphasizes high-precision bug detection—prioritizing issues developers would actually fix—while reducing incorrect comments. Teams can further steer reviews using custom instructions or agents.md guidelines, tailoring what Codex should focus on or ignore.

How does Codex integrate into GitHub pull request workflows, and what control do teams have over when reviews run?

Codex can be enabled in Codex web settings for a repository. Once enabled, PRs submitted to that repo can be automatically reviewed—such as when a PR is marked “ready for review.” Teams can also trigger reviews manually by commenting “Codex review this PR,” including when the PR is still in draft so real humans don’t see it yet. That draft-stage approach supports early feedback without exposing incomplete work.

Why is Codex described as more than a static analyzer?

Codex code review is described as having access to the whole repository, not just the diff. It can track down dependencies and understand changes in the broader codebase, which matters when contributors don’t know every part of a complex system. It can also validate suspected issues by forming hypotheses and writing Python code to test them, then showing results on examples—turning findings into something test-backed rather than purely speculative.

What training and evaluation choices are meant to improve the usefulness of Codex reviews?

Codex code review models include specific training tasks that prioritize catching bugs people would be willing to fix in real life. Evaluations emphasize high precision, including a lower incorrect-comments rate compared with the previous generation of models. The ultimate measure is practical adoption: whether it finds real bugs while avoiding excessive, annoying commentary.

How has Codex been used internally to reduce risk or speed up engineering work?

Internally, Codex has reportedly prevented critical issues, including bugs that could delay important training runs and configuration problems not visible from a diff alone. It also helps engineers contribute to unfamiliar areas more confidently—for instance, a Codex review flagged an incorrect approach in a VS Code extension change, and a follow-up workflow allowed Codex to take over to fix the issue after the initial finding.

What mechanisms let teams customize what Codex pays attention to during review?

Codex can be steered with custom user instructions and by detecting agents.md files within the codebase. Teams can add code review guidelines that tell the model what to focus on, what requirements matter, and what problems to ignore so it doesn’t distract developers. Custom instructions can also influence response style, such as requesting more validation behavior.

How does local review in the Codex CLI change the review timeline?

Codex code review is also available in the Codex CLI, enabling terminal-based review before code reaches GitHub. A typical workflow is to run a command like “slash review” to tell the model to review current local changes. Because the CLI can execute more things locally, it’s positioned as a way to catch bugs earlier—before a PR is created.

Review Questions

What specific capabilities beyond diff-scanning does Codex use to validate code review findings?
How do precision-focused training and evaluation metrics relate to developer trust in automated review?
In what ways can agents.md or custom instructions shape Codex’s review behavior, and why does that matter for team workflows?

Key Points

1
Enable Codex in repository settings to automatically review pull requests as part of normal GitHub workflows.
2
Use draft-stage triggering (e.g., commenting “Codex review this PR”) to get early feedback without involving the full team.
3
Codex reviews can inspect the entire repository, trace dependencies, and test hypotheses by writing and running code rather than relying only on diffs.
4
Training prioritizes bug detection that developers would actually fix, with evaluations emphasizing high precision and fewer incorrect comments.
5
Codex supports an iterative workflow: after a finding, teams can ask Codex to take over and fix the issue.
6
Codex can be steered via custom instructions or agents.md guidelines to match team standards and reduce irrelevant commentary.
7
Codex CLI enables local review in the terminal so bugs can be caught before changes become a GitHub PR.

Highlights

Codex code review is described as tool-using and test-oriented: it can access the full repository and validate suspected issues by writing and running Python code.

High precision is a central design goal—reducing incorrect comments compared with prior Codex generations—so reviews stay useful instead of noisy.

Teams can control review timing by triggering Codex during PR draft stages, then escalating to human review later.

Custom review behavior can be encoded through agents.md, letting teams specify what to focus on, what to ignore, and how feedback should be formatted.

Local review via the Codex CLI shifts verification earlier, enabling terminal checks before code ever hits GitHub.

Topics

Automatic Code Review
Codex GitHub Integration
Repository-Wide Analysis
Precision Bug Detection
Codex CLI Local Review

Mentioned

Roma
Maya
Alex