Automatic code reviews with OpenAI Codex
Based on OpenAI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Enable Codex in repository settings to automatically review pull requests as part of normal GitHub workflows.
Briefing
OpenAI Codex is rolling out automatic code review that can run in both GitHub workflows and the local terminal, aiming to catch real bugs without flooding teams with low-value comments. The core idea is simple: enable Codex in a repository, and every pull request can be automatically reviewed—either immediately or on demand—so human reviewers spend less time on routine verification and more time on higher-stakes judgment.
The workflow is designed to fit existing team habits. Codex can be turned on in Codex web settings so PRs marked “ready for review” get picked up automatically. Teams can also control timing by triggering reviews while a PR is still in draft, using a comment command like “Codex review this PR.” That lets developers get early feedback without exposing work to the full team.
A key capability is that Codex code review isn’t limited to scanning diffs. The model can access the entire repository, follow dependencies, and reason about changes in the context of a larger codebase—an important distinction for complex projects where contributors don’t fully understand every subsystem. When it finds a potential issue, it can go further by forming hypotheses, writing and running Python code to test those hypotheses, and validating findings with concrete examples rather than relying on surface-level static analysis.
Training focuses on bug-finding that people would actually fix. Codex code review models were trained with specific tasks that prioritize catching bugs with practical impact, and evaluations emphasize high precision—particularly reducing incorrect comments compared with previous generations. Still, the ultimate test is real-world usage: finding genuine issues while staying quiet enough not to annoy developers.
Internally at OpenAI, Codex has already been used to prevent critical problems, including bugs that could delay important training runs and configuration issues that wouldn’t be obvious from a diff alone. It also supports safer contributions to unfamiliar code: for example, a Codex review flagged an implementation mistake in a VS Code extension workflow, and a follow-up interaction allowed Codex to take over and fix the issue after the initial finding.
Beyond default behavior, Codex is described as steerable through custom instructions. Teams can add code review guidelines—what to pay special attention to, what to ignore, and even response style—either via user instructions or by using agents.md files within the codebase. That makes the review process adaptable to different engineering standards.
Finally, Codex code review is expanding beyond the cloud. A new “review in the Codex CLI” feature enables terminal-based review of local changes before they become a GitHub PR, with the model able to execute more checks locally. The stated goal is to improve this feature over the coming weeks so teams can catch bugs earlier, reduce production risk, and accelerate shipping with less manual verification burden.
Cornell Notes
OpenAI Codex is being positioned as an always-on code reviewer that fits into existing engineering workflows. After enabling it for a repository, Codex can automatically review pull requests on GitHub, and it can also review changes locally through the Codex CLI. Unlike diff-only tools, Codex can inspect the full repository, trace dependencies, and validate findings by writing and running code to test hypotheses. Training emphasizes high-precision bug detection—prioritizing issues developers would actually fix—while reducing incorrect comments. Teams can further steer reviews using custom instructions or agents.md guidelines, tailoring what Codex should focus on or ignore.
How does Codex integrate into GitHub pull request workflows, and what control do teams have over when reviews run?
Why is Codex described as more than a static analyzer?
What training and evaluation choices are meant to improve the usefulness of Codex reviews?
How has Codex been used internally to reduce risk or speed up engineering work?
What mechanisms let teams customize what Codex pays attention to during review?
How does local review in the Codex CLI change the review timeline?
Review Questions
- What specific capabilities beyond diff-scanning does Codex use to validate code review findings?
- How do precision-focused training and evaluation metrics relate to developer trust in automated review?
- In what ways can agents.md or custom instructions shape Codex’s review behavior, and why does that matter for team workflows?
Key Points
- 1
Enable Codex in repository settings to automatically review pull requests as part of normal GitHub workflows.
- 2
Use draft-stage triggering (e.g., commenting “Codex review this PR”) to get early feedback without involving the full team.
- 3
Codex reviews can inspect the entire repository, trace dependencies, and test hypotheses by writing and running code rather than relying only on diffs.
- 4
Training prioritizes bug detection that developers would actually fix, with evaluations emphasizing high precision and fewer incorrect comments.
- 5
Codex supports an iterative workflow: after a finding, teams can ask Codex to take over and fix the issue.
- 6
Codex can be steered via custom instructions or agents.md guidelines to match team standards and reduce irrelevant commentary.
- 7
Codex CLI enables local review in the terminal so bugs can be caught before changes become a GitHub PR.