OpenAI Codex Coding Agent with O4-mini

TL;DR

Codex CLI can scaffold and build a TypeScript MCP server directly from terminal automation, then connect it to Cloud Code.

Briefing Cornell Notes

Briefing

OpenAI Codex CLI is positioned as a lightweight “coding agent” that runs directly in a developer’s terminal, and early hands-on testing suggests it can build and operate an MCP server successfully on the first attempt—then drive a real workflow inside Cloud Code. The practical payoff is speed and reduced cost risk: the tester runs the agent with the O4-mini model, scaffolds a TypeScript MCP server, wires it into Cloud Code, and gets a working video-generation URL from Cling AI using a Replicate API token.

Setup starts with installing the Codex CLI via npm, then configuring an OpenAI API key. After installation, Codex prompts for confirmation and defaults to O4-mini. From there, it behaves much like Cloud Code in terms of workflow: it can read and summarize existing documentation, search a codebase, and generate new files. The tester adds documentation for MCP server building (including Cling AI notes) and then asks Codex to summarize and use that material to implement a Postgress MCP server workflow—specifically, an MCP server tool that accepts a string argument, calls the Replicate API and the Cling AI video generator, and returns a URL.

A key feature tested is Codex’s “full auto” mode with safety constraints. In this mode, Codex scaffolds files inside a sandbox, installs missing dependencies, and runs with network disabled and directory sandboxing—intended to keep execution safer while still allowing automation. The agent iterates through approvals, creates an MCP server directory, and produces concrete build steps: cd into the server folder, run npm install, export needed environment variables, and run npm build. When shell commands are executed, errors surface clearly, and subsequent attempts fix issues quickly. The build completes without errors, and the tester exports the token and registers the MCP server using a Cloud/Claude MCP add flow.

The integration then moves into Cloud Code. The tester adds the MCP server (named “cling AI node”) and verifies connectivity, then sends a prompt for a high-speed action car chase drone shot. Cloud Code receives a URL response, and the generated video plays, confirming the end-to-end pipeline works: Codex-built MCP server → Cloud Code tool call → Cling AI video generation via Replicate.

Cost monitoring is treated as an open question. The tester notes O3 pricing is expensive (reasoning models consume many tokens due to “thinking” tokens), while O4-mini is far cheaper (reported as roughly 1.1 input / 4.4 output per unit). They check OpenAI dashboards during the run but don’t see cost immediately, planning to verify later. Model switching inside Cloud Code is also tested: O4-mini is available, and the model selector lists many options, but image generation fails in this environment.

Overall, the first impression is that Codex with O4-mini can deliver a working MCP server quickly and reliably, with the main remaining uncertainty being long-term performance and whether cost advantages hold versus Cloud Code’s more expensive setups.

Cornell Notes

Codex CLI, running in a terminal, can scaffold and build a TypeScript MCP server that integrates with Cloud Code. Using O4-mini as the default model, the tester runs Codex in “full auto” mode with sandboxing and network disabled, then iterates through approvals until the server builds cleanly. The resulting MCP server accepts a prompt string, calls the Replicate API for Cling AI video generation, and returns a URL. After registering the MCP server in Cloud Code, the same video prompt produces a working video on the first try. The main unresolved variable is total cost, since O3-style reasoning can be expensive while O4-mini is reported as much cheaper.

What did Codex CLI successfully produce in this test, and why does that matter for developers?

It generated a working TypeScript MCP server that Cloud Code could connect to. The server was designed to expose a tool that takes a string argument (a video prompt), then triggers video generation through Cling AI via the Replicate API and returns the resulting URL. That matters because it demonstrates an end-to-end path: automated code scaffolding → dependency installation/build → MCP registration → tool invocation inside an IDE-like client.

How did “full auto” mode change the workflow, and what safety constraints were mentioned?

Full auto mode let Codex scaffold files in a sandbox and install missing dependencies automatically. The tester specifically notes it runs “network disable and directory sandboxed,” aiming to keep execution safer while still allowing automation. In practice, Codex created the MCP server directory, generated build steps, and handled shell command execution with visible errors when they occurred.

What evidence showed the MCP server worked on the first try?

After Codex built and the tester registered the MCP server in Cloud Code, Cloud Code returned a URL in response to a Cling AI video prompt (“highspeed action car chase drone shot”). The tester then played the video and reported it looked good, with the key point being that the MCP server integration succeeded without needing major rewrites.

What role did tokens and model choice play in the cost discussion?

The tester highlighted that reasoning models can be expensive because they use many tokens, including “thinking tokens.” They cited O3 pricing as costly (noting 10 in / 40 out) and contrasted it with O4-mini being much cheaper (about 1.1 in / 4.4 out). They planned to confirm actual cost via the OpenAI dashboard, since cost wasn’t immediately visible during the session.

How was the MCP server configured for secrets, and what change improved usability?

Initially, the server used environment variables for the Replicate API token. The tester then requested a change so the token could be passed as a CLI argument when adding the server to Cloud Code. That adjustment made the Cloud Code setup simpler: the token is supplied during the “MCP add” step rather than relying on pre-exported environment variables.

What limitations appeared during model testing?

When trying to switch models and test image generation, the environment didn’t support images (“we can’t do images in this environment”). Model switching required starting a new chat in Cloud Code, but the model selector still listed many available models, including O4 variants.

Review Questions

What specific tool behavior did the MCP server implement (inputs/outputs), and how did Cloud Code use it to generate a video URL?
Which automation mode and sandboxing constraints were used, and how did that affect error handling during the build?
Why did the tester consider O4-mini potentially cheaper than O3, and what token behavior drove that expectation?

Key Points

1
Codex CLI can scaffold and build a TypeScript MCP server directly from terminal automation, then connect it to Cloud Code.
2
Using O4-mini as the default model, the tester achieved a working MCP-to-Cloud Code integration on the first attempt.
3
Full auto mode enabled sandboxed scaffolding and dependency installation, with network disabled and directory sandboxing mentioned as safety constraints.
4
The MCP server tool accepted a prompt string, called Cling AI via the Replicate API, and returned a video URL that Cloud Code displayed.
5
Passing the Replicate API token as a CLI argument during MCP registration made setup easier than relying on exported environment variables.
6
Cost remains a key uncertainty: O3-style reasoning was described as expensive due to heavy token usage, while O4-mini was expected to reduce spend.
7
Model switching in Cloud Code may require starting a new chat, and image generation failed due to environment limitations.

Highlights

Codex built a Cling AI video-generation MCP server that Cloud Code connected to successfully on the first try, returning a usable video URL.

Full auto mode combined automation with sandboxing: scaffolding and dependency installs happened while network access was disabled and directories were sandboxed.

O4-mini was treated as a cost lever versus O3 because reasoning models can consume many “thinking” tokens.

Passing secrets (Replicate token) as a CLI argument during MCP add streamlined Cloud Code setup.

Topics

Codex CLI
MCP Servers
Cloud Code Integration
O4-mini
Cling AI Video Generation

Mentioned

MCP
CLI
API
VSSL
O4-mini
O3

OpenAI Codex Coding Agent with O4-mini | Claude Code Killer?