How to Run OpenCode Inside an Autonomous Claude Code AI Agent
Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Cloud Code can run OpenCode via a CLI-style command discovered from OpenCode’s documentation, enabling automated model execution.
Briefing
An autonomous Claude Code agent can now run OpenCode from Cloud Code via a simple CLI command, swap in different OpenRouter models, and generate side-by-side benchmark videos automatically. The practical payoff: one prompt can trigger multiple model runs in parallel, producing HTML outputs that get converted into a grid-style MP4 and then packaged for an X post—turning model comparison into a repeatable workflow rather than a manual testing chore.
The setup starts by pulling OpenCode’s CLI documentation and feeding it into Cloud Code so it can discover the correct “run” flag and command structure. The key command format that emerges is: OpenCode run <model> <provider> <model> <prompt> (with the provider/model naming matching OpenRouter). With an example prompt—whether someone should “walk or drive to the car wash” 50 meters away—the workflow successfully returns a model-specific answer. Testing across models shows the behavior changes as expected: one model recommends walking, while another recommends driving, aligning with the “can’t wash the car if you leave it behind” logic.
Once the CLI invocation works, the workflow is converted into a reusable Cloud Code “skill.” The skill is designed for parallel execution: run the same prompt against multiple OpenRouter models, capture each output, and save results into a consistent folder structure. That structure becomes crucial for the next step—benchmarking creative generation.
For the creative benchmark, the prompt instructs the system to generate a single full-screen animated retro arcade “space battle” scene in HTML5. The agent saves each run as a model-labeled HTML file (for example, game_<model>.html) inside a dedicated experiment directory. The transcript then demonstrates running four models simultaneously—GLM5, Minimax 2.5, Gemini 3 Pro, and Opus 4.6—so the comparison happens quickly and consistently under identical prompt conditions.
After the HTML files land, a Remotion-based skill turns them into a single grid-style video. The result is a side-by-side visual comparison where each panel is labeled by model name, making differences in animation style and scene composition easy to spot at a glance. The workflow is then consolidated into one pipeline that produces an MP4 for the “retro space battle benchmark.”
The final automation step prepares social sharing: the agent creates an X draft by attaching the MP4 and generating a caption that summarizes the experiment—“four LLMs” given the same prompt and building the same HTML5 demo, with results shown side by side. The overall message is less about any single model’s quality and more about building an agent skill that can repeatedly generate, render, and package model comparisons—ready for ongoing testing as new models appear.
Cornell Notes
The workflow builds an autonomous testing skill that runs OpenCode from Cloud Code using a CLI-style command discovered from OpenCode documentation. With OpenRouter, the same prompt can be executed across multiple models in parallel (example models include GLM5, Minimax 2.5, Gemini 3 Pro, and Opus 4.6), and each output is saved as a model-specific HTML file. Those HTML files are then converted into a single grid-style MP4 using a Remotion skill, enabling quick visual side-by-side benchmarking. The pipeline ends by preparing an X draft that attaches the MP4 and generates a caption describing the comparison. This matters because it turns model evaluation—especially creative HTML generation—into a repeatable, automated routine.
How does Cloud Code learn to run OpenCode from the command line?
What command structure enables model switching through OpenRouter?
How is the benchmark prompt turned into a parallelizable experiment?
Why does saving model-specific HTML files matter for the video comparison?
What does the pipeline automate at the end for sharing?
Review Questions
- What is the role of OpenCode’s CLI documentation in building the Cloud Code skill?
- How does the workflow ensure that different models are compared fairly?
- Describe the sequence from model execution to MP4 creation and then to an X draft.
Key Points
- 1
Cloud Code can run OpenCode via a CLI-style command discovered from OpenCode’s documentation, enabling automated model execution.
- 2
OpenRouter model/provider switching lets the same prompt produce different outputs across multiple LLMs.
- 3
A reusable Cloud Code skill supports parallel runs, saving each model’s output as a model-labeled HTML file for later processing.
- 4
Remotion converts the set of generated HTML files into a single grid-style MP4, making creative comparisons visually consistent.
- 5
The pipeline can package results for social sharing by generating an X draft with the MP4 and an experiment caption.
- 6
The workflow is designed for repeatable benchmarking, so new models can be added to the parallel list without rebuilding the process.