Long Running AI Agents | On The Edge #4

TL;DR

A long-running AI agent is an autonomous, stateful process that can persist over time, call external tools, and update outputs without constant user prompts.

Briefing Cornell Notes

Briefing

Long-running AI agents are best understood as autonomous, stateful processes that keep working for hours or days—planning multi-step goals, calling external tools, and persisting progress via memory and checkpoints—without needing constant user prompts. The practical challenge is controlling how long the agent runs and ensuring it stops at the right moment, especially when tool calls and web research can keep expanding. This walkthrough focuses on a simple, time-boxed framework to make that control concrete.

The setup defines a “longrunning AI agent” as an autonomous stateful software process that can persist for a long window while maintaining memory and checkpoints. Instead of aiming for days, the demo uses short “time tasks” to prove the mechanism: the agent should research for a fixed duration, update a live web page in real time, and hard-stop when the time window ends. To do that, the system uses MCP servers as external tool providers—specifically custom MCP tools for searching X (including handles and popular posts) and for doing similar research on Reddit. Built-in tools like web search and fetch are also available, so MCP is used where it adds targeted capabilities.

A key operational rule is embedded directly into the prompt: for time-based tasks, the agent must adhere to the time spent and must never stop working before the time is up. The initial job is explicitly time-boxed to five minutes. The agent is instructed to perform five minutes of research on “Sora 2,” then produce an in-depth article with three paragraphs and bullet points, aiming to find a Sora 2 video to embed. Crucially, it must keep updating and refining the existing content rather than simply appending new text. A background timer (implemented with a sleep call) tracks the 300-second window, and the agent is expected to stop research exactly when the timer expires.

During the five-minute run, the live page updates as the agent performs web searches and MCP-backed queries against X and Reddit. When the timer hits “time is up,” the process stops as intended. The results show the agent generated content but failed to embed an actual Sora 2 video; instead, it produced a plausible-sounding but imperfect title and summary. The point, however, is less about perfect research and more about verifying that the agent respects the time boundary and can iteratively update a working artifact.

A second run extends the window to ten minutes using the same prompt structure, with the timer adjusted to 600 seconds. Again, the agent stops when the time expires, confirming the time-boxing behavior scales at least from five to ten minutes. The longer run adds more substance—an image appears—but also risks producing “too much information” relative to what the prompt asked for, suggesting that prompting and constraints still matter even when the runtime control is correct.

The demo closes by connecting this approach to broader claims about long-horizon agents. It references an Anthropic observation that Cloud 4.5 maintained focus for more than 30 hours on complex multi-step tasks, implying that longer durations are possible but require a stronger framework than a simple time-box. The takeaway is that time-based “longrunning” agents can be built with MCP tool calls, real-time artifact updates, and strict stop conditions—making it easier to experiment with longer autonomous workflows in a controlled way.

Cornell Notes

Long-running AI agents are autonomous, stateful processes that can persist for extended periods while planning multi-step goals, calling external tools, and updating outputs without constant user prompts. This demo implements a time-boxed approach: the agent runs for a fixed window (5 minutes, then 10 minutes), performs research via MCP servers (X and Reddit tools) plus built-in web tools, and continuously updates a live web page. A background timer enforces a hard stop exactly when the allotted time ends, preventing runaway tool usage. The 5-minute run produced iterative content but missed the requested Sora 2 video embed; the 10-minute run added more detail (including an image) but also risked exceeding the prompt’s intent. The core value is proving reliable runtime control and iterative refinement under strict time constraints.

What makes an AI agent “long-running” in this framework, and how is that different from a one-shot chatbot response?

Here, “long-running” means an autonomous stateful process that persists across time, maintains memory/checkpoints, plans and executes multi-step goals, and can keep working while calling external tools. Instead of generating a single response and stopping, the agent repeatedly updates an artifact (a live web page) while a timer window remains active.

How does the demo enforce that the agent stops at exactly the allotted time?

A background timer is started with a sleep call (300 seconds for 5 minutes, 600 seconds for 10 minutes). The prompt includes a hard constraint: for time-based tasks, the agent must adhere to the time spent and must not stop before the time is up. When the timer reaches “time is up,” the process halts research, matching the intended hard stop behavior.

What role do MCP servers play in the research workflow?

MCP servers provide targeted external tool capabilities. In the demo, custom MCP tools support searching X (including handles and popular posts) and searching Reddit for posts and research. Built-in tools like web search and fetch are also available, but MCP is used for the specialized X/Reddit research actions.

Why did the 5-minute run fall short on the “embed a Sora 2 video” requirement?

The agent generated content and performed research, but it did not successfully embed an actual Sora 2 video. The resulting title and framing looked plausible yet imperfect, and the run also produced limited image/video assets. The demo treats this as acceptable because the primary test is runtime control and iterative updating rather than perfect retrieval.

What changes when the time window increases from 5 minutes to 10 minutes?

The agent again stops exactly when the timer expires, confirming the time-boxing mechanism works for longer windows. With more time, it produces more output—adding an image in the 10-minute run—but it can also drift toward “too much information” relative to what the prompt asked for, highlighting the need for tighter prompting constraints.

Review Questions

How does a time-boxed instruction in the prompt work together with a background timer to prevent an agent from running past its allowed window?
What specific tool categories are used for research in this setup (MCP vs built-in tools), and what are the examples of each?
What trade-off appears when increasing the allowed runtime from 5 minutes to 10 minutes, and how might prompting need to change to address it?

Key Points

1
A long-running AI agent is an autonomous, stateful process that can persist over time, call external tools, and update outputs without constant user prompts.
2
Time-based control can be implemented with a strict timer plus prompt rules that require adherence to the allotted window and a hard stop at expiry.
3
MCP servers can supply specialized external research tools, such as searching X and Reddit, while built-in tools handle generic web search and fetching.
4
Iterative refinement matters: the agent is instructed to update and improve existing content rather than only adding new text.
5
Short time windows (5 minutes) can validate runtime behavior even if retrieval quality (like embedding a specific video) is imperfect.
6
Longer windows (10 minutes) increase output richness (e.g., adding images) but also raise the risk of producing more information than the prompt intends.
7
Scaling to multi-hour or multi-day focus likely requires more than simple time-boxing, aligning with claims about longer-horizon agent performance in the field.

Highlights

A background sleep timer (300s/600s) combined with a “hard stop” prompt constraint makes the agent stop research exactly when the time window ends.

Custom MCP tools for X and Reddit enable ongoing research while a live web page is updated in real time.

The 5-minute run respected the time limit but failed to embed an actual Sora 2 video—quality wasn’t the main success metric.

Extending to 10 minutes produced more detail (including an image) but also risked overshooting the prompt’s intended scope.

Topics

Long-Running AI Agents
Time-Boxed Tasks
MCP Tooling
Real-Time Web Updates
Autonomous Research