What even is an AI Agent?! (The Standup)

TL;DR

An AI coding agent is best understood as an LLM plus a defined toolset plus a looping process guided by system prompts until completion.

Briefing Cornell Notes

Briefing

AI agents for software development are essentially an LLM wired to programming tools and kept running through iterative “loops” until the task is complete—but the hard part isn’t the core loop. The hard part is making the experience reliable, safe, and usable inside real developer workflows, especially when the agent needs access to a local codebase and the developer’s existing setup.

In a discussion focused on Open Code, the team frames an agent as “LLM + tools + loops + a system prompt” that guides the model to repeatedly call tools like editing files, reading project context, and running commands. The loop behavior largely comes from the model itself: when the model can’t proceed without tool results, it stops and requests specific tool calls, then continues after receiving outputs. That design means developers don’t have to handcraft the interruption logic for every step; the model’s stop reasons and tool-call protocol drive the iteration.

Open Code’s differentiation is less about squeezing extra capability from models and more about packaging an agent that fits how programmers already work—particularly in the terminal. The product aims to run the agent on the user’s machine with access to local files, rather than forcing developers to recreate their environment in the cloud. That local-first approach also enables a planned mobile client: the laptop can keep running while the developer steps away, with notifications and the ability to continue the conversation later. The team emphasizes that this matters because most dev environments are local, and remote agents require duplicating configuration that many teams don’t maintain consistently.

A concrete example shows how Open Code reduces hallucinations and improves correctness: edits trigger diagnostics from the Language Server Protocol (LSP). When the agent applies a patch to a file, the system returns LSP errors (for example, TypeScript diagnostics). The agent then uses those diagnostics as feedback to fix issues immediately, correcting mistaken assumptions about missing functions or incorrect types. Open Code runs LSP servers in parallel rather than trying to hijack already-running ones, and it ships out-of-the-box support so users don’t need perfect custom configuration.

The team also tackles why “agentic coding” is still messy in practice. Models differ in how eagerly they call tools, and tool-calling behavior is shaped by training and system prompts; some models require more explicit tool instructions to use the right functions. Because the system prompt and tool schemas can change outcomes in non-obvious ways, the team says it’s building consistent, real-world benchmarks for agent behavior—using a shared codebase and qualitative evaluation—since traditional model benchmarks don’t correlate well with real developer tasks.

Finally, the discussion highlights operational edge cases that don’t show up in weekend demos: handling cancellations mid-tool-call, managing context-window limits, and dealing with the non-determinism of LLM behavior. Open Code addresses long sessions by summarizing history when context pressure rises, while also warning that summaries are lossy. The overall message: building an agent isn’t just about getting the loop working—it’s about engineering the surrounding system so it stays safe, correct, and productive in the messy reality of software development.

Cornell Notes

An AI coding agent is built by combining an LLM with programming tools and a looping mechanism that keeps running until the task is done. Open Code’s approach emphasizes a local-first workflow: the agent runs on the developer’s machine with access to the real codebase and environment, plus a planned mobile client to continue steering work while away. A key reliability feature is LSP diagnostics feedback—after the agent edits files, returned TypeScript (or other) errors guide the next tool calls and reduce hallucinations. Because tool-calling behavior varies by model and system prompts, the team is developing consistent, real-world benchmarks and qualitative evaluation to measure improvements. The hardest engineering work comes from edge cases like cancellations, context limits, and non-deterministic model behavior.

What’s the practical definition of an “AI agent” in software development discussed here?

It’s treated as an LLM connected to a set of programming tools (e.g., edit files, search/read code, run commands) and kept in an iterative loop. The system prompt and tool descriptions steer the model toward the right sequence of actions, while the loop continues until the model’s stop condition indicates completion or it needs tool outputs. The team summarizes the pattern as “LLM + tools + loops + a system prompt.”

How does Open Code use LSP to improve correctness after the agent edits code?

Open Code applies changes through an edit-file tool that takes a patch-like instruction (old string/new string placement). The tool response includes diagnostics from the Language Server Protocol. For example, after editing a TypeScript file, the system returns the TypeScript LSP errors, and the agent is prompted to fix them immediately. This feedback loop helps correct hallucinations such as calling functions that don’t exist or violating type expectations.

Why does tool-calling reliability depend on model choice and system prompts?

Tool usage isn’t uniform across models. Some models are tuned to call the provided tools more eagerly (the discussion cites Anthropic’s Cloud Code as particularly effective at tool calling), while others may be smarter at reasoning but less proactive about tool use. System prompts can “massage” behavior by explicitly naming tools and instructing the model to follow a tool-driven plan (e.g., a “set up your plan before executing” pattern).

What’s the “local-first” product bet behind Open Code?

Most developers keep workable environments locally. Remote agent approaches can work, but they require recreating the developer’s environment in the cloud, which not everyone does reliably. Open Code instead runs the agent on the user’s machine so it can use the existing setup. The planned mobile client connects to a running session so the developer can step away and later continue, with notifications when the agent needs input.

How does the system handle long-running sessions and context-window limits?

When the loop approaches the model’s context window, Open Code pauses and summarizes the conversation history, then continues with the summary rather than the full transcript. The team notes this summary is lossy (a compression step), but it’s effective enough that the experience can feel like “infinite context.” They also recommend starting new sessions for new tasks to reduce noise and preserve signal.

What kinds of benchmarks does the team say are missing for agentic coding assistants?

Model benchmarks often don’t correlate with real-world developer outcomes. The team argues there’s no strong benchmark that compares multiple agentic coding assistants on the same prompts and codebase with practical criteria like effectiveness, cost, and speed. They’re building consistent, real-world looking codebase scenarios and qualitative grading, plus instrumentation to inspect diagnostics and tool-call outcomes.

Review Questions

How does returning LSP diagnostics after an edit change the agent’s next actions compared with a setup that doesn’t provide compiler feedback?
Why might two different LLMs behave differently even when given the same tool list and task description?
What tradeoffs come with summarizing long session history to manage context windows, and how might that affect agent reliability?

Key Points

1
An AI coding agent is best understood as an LLM plus a defined toolset plus a looping process guided by system prompts until completion.
2
Open Code emphasizes a local-first workflow so the agent can use the user’s real dev environment instead of recreating it in the cloud.
3
LSP diagnostics are integrated into the edit loop: after file edits, returned errors (e.g., TypeScript) steer the agent toward fixes and reduce hallucinations.
4
Tool-calling behavior varies by model and system prompt design; some models call tools more eagerly, while others need more explicit tool instructions.
5
Open Code plans a mobile client that connects to a running local session, enabling notifications and continued steering while away from the desk.
6
Reliability work focuses on edge cases—cancellations mid-tool-call, context-window limits, and non-deterministic model behavior—rather than only improving raw model capability.
7
The team is building consistent, real-world benchmarks for agent behavior because existing model benchmarks don’t reliably predict developer outcomes.

Highlights

Open Code treats agent loops as tool-driven stop-and-resume behavior: the model halts when it needs tool results, requests them, then continues after receiving outputs.

LSP diagnostics are fed back immediately after edits, turning compiler/type errors into direct guidance for the next agent step.

The product bet is local-first: keep the agent running on the developer’s machine and add a mobile client to continue the conversation without rebuilding a cloud environment.

The team argues tool-calling isn’t solved by “one prompt fits all”—model differences and system prompt structure strongly affect whether tools get used correctly.

Long sessions are managed with lossy summarization, and the team recommends starting new sessions to reduce noise and preserve signal.

Topics

AI Agents
Open Code
Terminal Workflow
LSP Diagnostics
Tool Calling
Mobile Client
Agent Benchmarks

Mentioned

LLM
LSP
TUI
IDE
SWE
ARC AGI