24/7 Claude Code AI Agent 12-Day Review: The Results Will Surprise You

TL;DR

The agent maintained about 95% uptime over roughly 12 days, with one Chrome debug-related issue later fixed.

Briefing Cornell Notes

Briefing

A 12-day run of an autonomous “Claude Code” agent on a Mac mini is producing steady social growth—about 292 new X followers from zero—while maintaining roughly 95% uptime. The most striking early outcome is the agent’s ability to execute scheduled and event-driven tasks reliably enough to generate measurable attention, with follower gains averaging around 50 per day in recent days.

Uptime has stayed high despite a hiccup involving a Chrome debug “burger” version, which was later fixed. The background workflow runs a script that loads “crowd jobs” and executes Claude Code commands using a terminal flag (“-p”), leveraging Claude Code’s support for running commands that way. In practice, the system has been stable enough to keep the experiment moving without constant manual intervention.

On X, engagement results show both reply-driven reach and standalone posting performance. The most engaging reply—left on February 11—pulled roughly 25,000 views along with substantial likes and reposts. Other posts landed in the 10,000-view range, but that February 11 reply stands out as the top performer. For raw reach from posting, a reaction-style update tied to a YouTube reaction video generated about 2,400 views, plus a smaller set of likes and favorites.

YouTube metrics add another layer: in the last 28 days the channel reached about 10,400 views, with the last 12 days driving much of that momentum. Subscriber growth is reported at 325 total, with the creator estimating around 160 net new subscribers during the period. View averages are described as roughly 3,400 per video, including longer videos that still attracted a few hundred views each. A key workflow improvement is using the agent to generate “reaction-style” videos, which the creator says turn out consistently well.

Not everything has been smooth. Early missteps included double posting—comments and replies going out twice—after which the system was adjusted. Direct messages also caused trouble: the agent sent a DM exchange that led to the creator being tagged, described as a “crisis” and a strange behavior pattern worth addressing. Still, the overall count of failures is framed as limited compared with the successes.

Cost remains manageable for this scale. The agent ran on Claude Code’s $100 tier “max plan,” with spending estimated at about $50 after nearly two weeks. Additional production expenses—video and thumbnail creation—are estimated at roughly $30 more, putting total costs around $80 so far.

Looking ahead, the agent is being positioned as a contributor to a skills marketplace site, “skillsmd.store,” where skills are packaged as “skill.md.store” content. The plan is to add more high-value skills and let the agent generate promotional snippets for the site. Early monetization is already present, with a reported ~$30 in revenue from a few sales. The creator also signals a separate upcoming video focused on the “biggest accomplishment” so far, suggesting there’s a standout result beyond follower and view counts.

Overall, the experiment is framed as a success: strong execution of predefined skills, consistent reaction-video output, and a promising attention-growth trajectory—especially on X, where responsiveness appears to be highest.

Cornell Notes

Over 12 days, an autonomous Claude Code agent running on a Mac mini delivered high reliability (about 95% uptime) and measurable audience growth. Starting from zero on X, it reached 292 followers, with recent gains averaging around 50 per day, and its top reply generated roughly 25,000 views. On YouTube, the channel reached about 10,400 views in the last 28 days and about 325 subscribers total, with an estimated 160 net new subscribers during the period; reaction-style videos are averaging a few thousand views each. Failures were mostly operational—double posting and an ill-advised DM exchange—rather than systemic breakdowns. Costs were kept relatively low by using Claude Code’s $100 max plan (about $50 spent in nearly two weeks) plus an estimated ~$30 for production work.

What concrete metrics show the agent is producing real outcomes rather than just running in the background?

The experiment tracks multiple measurable outputs: X followers rose from zero to 292 in about 12 days, and the creator reports recent daily gains around 50 followers. Engagement is quantified too: a February 11 reply reached about 25,000 views with many likes and reposts, while other posts landed around 10,000 views. On YouTube, the channel reached about 10,400 views in the last 28 days and about 325 subscribers total, with the last 12 days driving much of the change; average video performance is described as roughly 3,400 views per video.

How reliable is the agent’s automation, and what technical detail supports that reliability?

Uptime is described as very good—around 95%—with one issue tied to a Chrome debug “burger” version that was later fixed. The background workflow runs a script that loads “crowd jobs” and executes Claude Code commands in the terminal using a “-p” flag, leveraging Claude Code’s support for that execution mode. That combination is presented as a key reason the system can keep tasks running consistently.

What kinds of failures occurred, and what do they reveal about where automation needs guardrails?

The most serious recurring failure was double posting, where comments and replies went out twice. Another problematic behavior involved direct messages: the agent sent a DM exchange that resulted in the creator being tagged, described as a crisis and “strange” enough to question whether the test agent should handle DMs at all. Together, these failures point to the need for stricter deduplication and tighter permissions around private messaging.

How were costs estimated, and why does the Claude Code plan choice matter?

The agent used Claude Code’s max plan at the $100 tier. The creator estimates spending at about $50 after nearly two weeks, then adds production costs for videos and thumbnails—estimated around $30—bringing total costs to roughly $80 so far. The plan choice is linked to a technical capability: using the “claude-p” flag to query Claude Code via an SDK and exit, which the creator says works well for this setup.

What is the forward plan for turning the agent’s work into something reusable or monetizable?

The agent is being positioned as a contributor to “skillsmd.store,” with skills packaged as “skill.md.store” content. The plan is to add more high-value skills and have the agent create small promotional snippets for the site. Early monetization is already happening: the creator reports the site has made a few sales and generated about $30 in revenue so far.

Why are reaction-style YouTube videos a focal point in the workflow?

Reaction-style videos are treated as a repeatable success pattern. The creator describes a “really good workflow” for using the agent to generate these videos and says the results are consistent and turn out well. They also highlight a specific skill used for YouTube reaction videos (with 47 description tokens) and express strong satisfaction with how reliably the agent follows the skill instructions to execute the desired output.

Review Questions

Which single metric best captures the agent’s reliability in this 12-day run, and what percentage was reported?
What two categories of failures were most prominent, and how did they affect posting behavior?
How does the creator connect Claude Code’s pricing/flags to both cost control and the agent’s ability to execute tasks?

Key Points

1
The agent maintained about 95% uptime over roughly 12 days, with one Chrome debug-related issue later fixed.
2
X follower growth went from zero to 292 in 12 days, with recent gains averaging around 50 followers per day.
3
The top X engagement came from a February 11 reply that reached about 25,000 views, outperforming other posts in the ~10,000-view range.
4
YouTube performance improved alongside the automation: about 10,400 views in the last 28 days and roughly 325 subscribers total, with reaction-style videos averaging a few thousand views.
5
Operational failures were mainly behavioral—double posting and an ill-advised DM exchange—suggesting the need for deduplication and stricter DM permissions.
6
Costs were kept relatively controlled by using Claude Code’s $100 max plan (about $50 spent in nearly two weeks) plus an estimated ~$30 for production work, totaling around $80 so far.
7
Future plans center on building and promoting reusable skills via skillsmd.store, with early revenue reported at about $30 from a few sales.

Highlights

From zero to 292 X followers in 12 days, with recent daily gains around 50—an attention-growth signal tied to ongoing automation.

A February 11 reply reached roughly 25,000 views, making it the experiment’s standout engagement moment on X.

Uptime was reported near 95%, supported by running Claude Code commands in the terminal using the “-p” flag.

Double posting and a DM mishap were the main failure modes, pointing to guardrail needs for automation.

Total spend was estimated at about $80 so far: ~$50 on Claude Code plus ~$30 for video and thumbnail production.

Topics

Claude Code Agent
X Growth
YouTube Reaction Videos
Automation Reliability
Skills Marketplace