AI Makes you Dumb and Slow
Based on The PrimeTime's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
A randomized controlled trial found that allowing AI assistance increased observed implementation time by about 19% for experienced open-source developers on real repo issues.
Briefing
A randomized controlled trial of early-2025 AI coding tools found a counterintuitive result: experienced open-source developers took about 19% longer to complete real, repo-based tasks when AI assistance was allowed—an outcome that clashes with both developer expectations and expert forecasts. The study frames this as a snapshot of tool capability in a realistic setting, not a claim about AI’s ultimate potential. Still, the magnitude is striking enough to force a rethink of how “productivity” should be measured when AI changes the workflow.
Researchers recruited 16 experienced contributors from large open-source repositories (averaging roughly five years of prior experience per developer). Across 246 issues—bug fixes, features, and refactors—each developer worked on tasks under randomized conditions: either AI use was disallowed or AI use was allowed. When AI was allowed, developers could use tools they chose, primarily Cursor Pro with Claude 3.5 Sonnet and Frontier models (as available during the study). The core metric was implementation time for each issue, with developers recording screen activity and self-reporting total implementation time.
The headline finding: with AI enabled, realized completion times increased by 19% (the transcript also mentions a “99% longer” claim, but the more concrete figure repeatedly referenced is 19%). Developers’ own forecasts were far more optimistic. Before starting, participants predicted AI would reduce completion time by about 24%, and afterward still believed it had sped them up by roughly 20%—despite the observed slowdown. That gap between perception and measured performance becomes one of the study’s most important signals: self-reports and intuition may not track actual throughput when AI is involved.
The trial also digs into why benchmark-style success might not translate into real coding speed. The researchers argue that common software benchmarks often trade realism for scale: tasks can be self-contained, algorithmically scored, and missing the messy context of live repositories, human review standards, and iterative debugging. In real development, AI can produce code that is functionally correct but verbose, or it can fragment work into smaller subtasks without reducing total effort. Even when AI helps, downstream costs—review time, cleanup, and integration—may erase speed gains.
To address concerns about experimental validity, the study reports that the slowdown persisted across multiple analyses and outcome measures, and it attempted to rule out artifacts such as differential dropout or quality differences in submitted pull requests. It also lists plausible contributing factors, including over-optimistic forecasts, repository familiarity effects, reduced AI reliability in large complex codebases, low acceptance rates for AI generations (under 44%), and missing implicit repository context.
The transcript emphasizes that the result is not a universal verdict on AI. The authors explicitly avoid claiming that AI never speeds up developers or that these developers and repositories represent all software work. Instead, the study is positioned as evidence that early-2025 AI tools can slow experienced contributors in a specific, high-standard environment—and that reconciling benchmark scores, anecdotes, and field performance likely requires multiple evaluation methods. The takeaway for the broader AI debate: measuring “capability” isn’t the same as measuring “usefulness,” and workflow friction can dominate outcomes even when models look strong on tests.
Cornell Notes
A randomized controlled trial of early-2025 AI coding assistance found that experienced open-source developers completed real repository tasks about 19% more slowly when AI use was allowed. The slowdown conflicted with both developer self-forecasts (predicting ~24% faster) and post-task beliefs (still estimating ~20% faster), highlighting a large perception gap. The study used 16 developers and 246 issues from large, complex repos, with AI access primarily via Cursor Pro using Claude 3.5 Sonnet and Frontier models. Researchers argue that benchmark success may not translate to real productivity because real coding includes context, review standards, integration, and cleanup costs. The results are framed as setting-specific evidence rather than a universal claim about AI’s future impact.
How did the study measure “productivity” and why does that matter for interpreting the results?
What was the experimental design, and how did randomization support causal claims?
Why might developers believe AI speeds them up even when measured time increases?
How do benchmark results and real-world coding outcomes diverge in this framing?
What specific factors were proposed to explain the slowdown?
What does the study avoid claiming, and why is that boundary important?
Review Questions
- What does the study treat as a proxy for task difficulty, and how does that proxy help isolate the effect of AI on completion time?
- List at least three mechanisms that could turn “AI-generated progress” into longer real completion times in a repository workflow.
- Why might benchmark scores and anecdotal reports both be compatible with a field trial showing a slowdown?
Key Points
- 1
A randomized controlled trial found that allowing AI assistance increased observed implementation time by about 19% for experienced open-source developers on real repo issues.
- 2
Developer forecasts and post-task beliefs were substantially more optimistic than the measured outcome, indicating a major perception gap about AI’s speed impact.
- 3
Measuring time-to-completion helps avoid misleading productivity proxies like lines of code or number of subtasks, which AI can inflate without reducing effort.
- 4
Benchmark-style success may not translate to real productivity because real coding includes implicit repository context, integration, and human review/cleanup overhead.
- 5
The study reports the slowdown persisted across multiple analyses and attempted to rule out common experimental artifacts such as differential dropout or quality differences in submissions.
- 6
Low acceptance of AI generations and frequent cleanup of AI-produced code are central to the proposed explanation for why AI can slow experienced contributors.
- 7
The findings are framed as setting-specific evidence rather than a universal claim about AI’s future ability to accelerate software work.