Self-Correcting AI Coding Agent + Prompting Deep Dive

TL;DR

The agent follows an execution-validated loop: generate code, install dependencies, run it, and only then proceed to improvements.

Briefing Cornell Notes

Briefing

An AI coding agent can reliably turn a plain goal into working software by running generated code, capturing terminal errors, and feeding those errors back into a self-correction loop—then repeating the cycle while also prompting the system to add new features. The core mechanism is straightforward: generate code from a goal, install dependencies, execute it, and if execution fails, rewrite the code using the exact error message until it runs or a maximum number of attempts is reached. That “error-to-fix-to-retry” workflow matters because it shifts coding from a one-shot output to an iterative engineering loop, where failures become direct instructions for the next revision.

The build process starts with prompt engineering. For each goal, the system uses two system messages: one to instruct the model to produce Python code for the requested app, and another to enforce a strict output format by removing any non-executable text. A separate cleanup step strips away extra wrappers like leading “python” markers so the agent can save and run clean source files. After code generation, the agent scans for import statements and installs required packages via pip, then attempts execution.

When runtime errors occur, the agent captures the terminal error message and sends it back into a dedicated self-correction prompt: “The following python code has an error… please fix the error and provide the corrected code.” The corrected code then goes through the same cleanup, dependency-install, and execution pipeline. A configurable “max attempts” limit—set to five in the walkthrough—prevents infinite loops and forces the system to either converge or stop.

Once the code runs, the agent enters a second loop focused on expansion. It prompts the model with the current code plus a list of desired enhancements, asking for an updated full program that implements new features. After that, the agent again cleans the output, installs dependencies, and re-executes—so feature additions are validated by actual runs, not just by code review.

Testing demonstrates the loop’s practical value across multiple projects and languages. A PowerShell app for Bitcoin pricing runs on a schedule, then the agent builds a Python Snake game using Anthropic Claude 3.5 Opus. Early failures include an “unbound local error” related to variable scope; the self-correction prompt fixes it by ensuring the variable is properly accessible. Subsequent iterations add levels and color changes, and later an API timeout surfaces—showing that not all failures are purely code-generation issues.

Next, the agent generates a Bitcoin price graph using a free API (CoinDesk), then improves it with a moving average line and interactive zoom/hover features. A tooltip-related syntax error (“unterminated string literal”) is corrected on retry, and the graph renders with the intended hover behavior.

The workflow also adapts beyond Python. In Go, the agent builds a CLI password generator and adds UI elements like a progress bar and strength meter. Finally, a Go-based web CLI scrapes headlines from The Verge; one run fails during package setup, then succeeds after another correction cycle. Even when the output structure is messy, the agent still returns the requested headlines, and one example required four correction attempts before completion.

Overall, the transcript frames self-correction as a repeatable engineering pattern: treat execution errors as structured feedback, enforce clean code output, validate changes by rerunning, and cap retries to keep the process bounded.

Cornell Notes

The system turns a goal into working software by generating code, installing dependencies, and executing it. If execution fails, it feeds the exact terminal error back into a self-correction prompt that requests corrected code, then retries after cleaning the output. After the program runs, a second loop prompts the model to add new features to the current code, and the agent again validates the update by reinstalling dependencies (if needed) and rerunning. The approach relies heavily on prompt constraints that strip non-code text and remove common formatting artifacts, so the saved file is executable. In tests, the loop fixes real issues like Python variable-scope errors and syntax errors, and it carries over to Go projects as well.

How does the agent convert a user goal into executable code without getting stuck on messy model output?

It uses prompt constraints plus post-processing. The initial system message instructs the model to write Python code for the goal (e.g., “display the stock price of Apple using a free API”). A second system message tells the model to remove all text not needed to run Python code, so only executable code remains. Then a cleanup function removes extra wrappers such as leading “python” markers and replaces non-code strings with whitespace. This ensures the saved file can be executed directly.

What exactly drives the self-correction loop when code fails?

Execution failures produce terminal error messages (syntax errors, runtime exceptions, etc.). The self-correction prompt includes both the current code and the specific error text, asking for corrected code that fixes the error. The corrected output is cleaned again, dependencies are reinstalled if needed, and execution is retried. A max-attempts setting (five in the walkthrough) bounds the process.

How does the agent add features after the initial version works?

After a successful run, it prompts for enhancements using the current code as context. The prompt asks for “three new cool features” (example ideas included colors, UI animations, and styling improvements for Snake). The model returns updated full code, which then goes through the same cleanup, dependency installation, and execution validation steps. Feature additions are therefore tested by rerunning, not just assumed correct.

What kinds of errors were corrected during the Snake game test?

The first visible failure was an “unbound local error” involving a variable related to snake speed. The self-correction successfully fixed scope/access by adding a global declaration (“global snakespeed” in the transcript) so the variable is accessible where used. Later, the game reached higher levels (speed increases) and added color changes, indicating the iterative loop worked beyond just syntax.

How did the approach handle tooltip-related failures in the Bitcoin graph?

A hover/tooltip implementation produced a syntax error: “unterminated string literal” pointing to a specific line. The self-correction prompt used that exact error text to request corrected code, and the next run succeeded, restoring the intended hover behavior with the moving average line and interactive zoom.

Does the self-correction workflow generalize beyond Python?

Yes. The transcript shows the same pattern applied to Go: a CLI password generator required adjusting dependency installation to use Go packages rather than pip. A Go web CLI that scrapes headlines from The Verge initially hit an error during setup, then succeeded after another correction cycle. One run completed only after multiple correction attempts (fourth attempt), illustrating that the loop can converge even when early outputs fail.

Review Questions

When and why does the system enforce “remove all text not needed to run the python code,” and what problem does it prevent during execution?
How does the self-correction prompt use terminal error messages differently from the feature-addition prompt?
What does “max attempts” change about the agent’s behavior, and how would you choose a value for a new project?

Key Points

1
The agent follows an execution-validated loop: generate code, install dependencies, run it, and only then proceed to improvements.
2
Prompting includes strict output formatting so the model returns clean, executable code without extra text wrappers.
3
Self-correction is driven by the exact terminal error message paired with the current code, then rerun after cleaning and dependency installation.
4
A bounded retry policy (max attempts) prevents endless correction cycles when errors don’t converge.
5
After a successful run, a separate prompt asks for feature additions to the current code, followed by the same validation-and-retry process.
6
The workflow generalizes across languages by swapping dependency-install logic (pip for Python, Go package installs for Go) while keeping the error-to-fix loop.
7
Testing across Snake, Bitcoin graphs, Go CLIs, and scraping tasks shows the approach can fix both runtime exceptions and syntax errors, though external issues like API timeouts can still break runs.

Highlights

Self-correction works by turning terminal errors into direct instructions: the agent resubmits the code plus the exact error text to request a corrected full program.

Output cleanup is treated as essential infrastructure—removing non-code text and formatting artifacts prevents execution failures caused by bad file contents.

Feature upgrades aren’t “promised”; they’re validated by rerunning the updated code after dependency installation.

The same pattern extends to Go projects, with language-specific dependency handling but the same error-driven retry loop.

One scraping task required multiple correction attempts before completion, illustrating convergence can take several iterations.

Topics

AI Coding Agent
Self-Correcting Code
Prompt Engineering
Dependency Installation
Iterative Feature Expansion