New Research On CoPilot And Code Quality

TL;DR

Churn accelerated in open-source repositories, shifting from roughly six-month revision cycles to about two-week cycles on average.

Briefing Cornell Notes

Briefing

AI coding assistants are boosting short-term output while accelerating long-term code churn—an outcome that shows up in measurable changes to how open-source repositories evolve. The most striking finding is a shift in “churn” frequency: code lines are being changed far more often after being written, with churn that used to be low over roughly six months now appearing on an average cycle of about two weeks. That roughly doubles the turnover rate and compresses the time window in which developers must revisit and revise what they just produced.

The research behind these claims draws on a large dataset—about 21 million lines of changed code—tracking multiple dimensions of repository change rather than relying on productivity metrics alone. Across those dimensions, the pattern is consistent: developers are adding code faster, but they are getting worse at revising and refactoring it. The data points to more copy-pasted code, more duplicated blocks, and faster revision cycles, which together suggest that code is being committed and deployed before it has been properly validated or integrated into a maintainable structure.

A key “2024 milestone” is the balance tipping from refactoring to duplication. In earlier periods, “moved” code—refactoring or relocating code from one place to another—made up a larger share of change operations. By 2024, copy-pasted code surpasses moved code for the first time in the measured history. The implication is not that duplication never happens, but that the repo maintenance burden shifts toward co-maintaining multiple near-identical implementations of the same logic.

The transcript also sharpens what “copy-paste” means in this analysis. It includes cases where identical lines are duplicated within a single commit, not merely duplicated across a year of edits. That matters because it points to assistant-driven workflows where developers can accept multi-line suggestions quickly—sometimes with the AI effectively encouraging reuse by generating the same block again rather than calling an existing utility. The research further distinguishes “duplicated blocks” that are at least five lines long and already exist elsewhere in the repo; those large duplicated-block commits rose from about 0.5% in 2022 to 6.66% in 2024.

Churn-linked deletions are another warning sign. The most concerning deletions are “churn type” deletions: code that is newly authored and then deleted within about two weeks. The share of new lines changing within two weeks reached roughly 70% in the past year, while longer-horizon revisions (after one to two years) dropped to about a third of their earlier level. This aligns with defect-rate findings reported elsewhere: increased AI adoption correlates with higher defect rates, with one cited estimate suggesting a 25% increase in AI usage could raise defects by about 7%.

The discussion frames the problem as incentives and workflow, not just a single tool. If teams treat productivity as lines added or commits made, assistants that make it easy to “press Tab” will naturally increase the volume of code that later needs correction. The proposed remedy is cultural and technical: measure productivity in ways that reward reuse, consolidation, and maintainability; and for developers—especially juniors—differentiate by understanding the existing codebase well enough to reuse, refactor, and incrementally improve canonical implementations rather than recreating them in multiple places.

Cornell Notes

The research discussed links AI coding assistants to measurable declines in code quality signals, even when productivity rises. The clearest result is faster churn: code lines that used to be revised on a roughly six-month cadence are now being changed on an average cycle closer to two weeks in open source. The shift is driven by more duplication—copy-pasted code and duplicated blocks—along with more “churn type” deletions where newly written code is removed within about two weeks. In 2024, copy-pasted code overtakes moved/refactored code for the first time in the measured history, suggesting maintainability is worsening. The findings matter because duplication and rapid revision increase the odds of defects and create long-term co-maintenance burdens across large repos.

What does “churn” mean here, and why is the two-week cycle considered a major red flag?

Churn refers to how quickly code that was recently written gets changed again. The analysis reports that churn used to be low over about six months in open-source projects, but it has accelerated to roughly a two-week cycle. That compression means developers must revisit their own newly committed work much sooner, which is consistent with code being written, deployed, and then found wanting—forcing faster follow-up edits rather than stable refactoring.

How does the research define “copy-pasted” code, and what makes the “within a single commit” detail important?

Copy-paste here includes cases where identical lines are duplicated within a single commit, not just duplicated somewhere else in the repository over time. The transcript contrasts this with “close enough” duplication and with variable- or keyword-level differences. The within-commit framing matters because it suggests the assistant is enabling rapid acceptance of repeated multi-line patterns (e.g., a repeated JavaScript summing function or repeated CSS selector blocks) rather than developers carefully reusing an existing abstraction.

What changed in 2024 that signals a structural shift in how code evolves?

A “2024 milestone” is that copy-pasted code surpasses moved/refactored code for the first time in the measured history. Earlier years showed moved code (refactoring or relocating code from one file to another) taking a larger share of change operations. By 2024, moved code has plummeted to a smaller share than copy-paste, implying teams are increasingly duplicating implementations instead of consolidating them.

Why are duplicated blocks (e.g., five or more identical lines) treated as especially risky?

Duplicated blocks are risky because they create multiple implementations of the same idea that may not change together. The transcript cites a rise in commits involving large duplicated blocks: about 0.5% in 2022 versus 6.66% in 2024 for blocks of at least five identical lines already existing elsewhere in the repo. If one copy gets updated and others don’t, bugs can emerge from inconsistent behavior—an issue well documented in software maintenance research.

How do “churn type” deletions connect to defect risk?

Churn type deletions are newly authored code that gets deleted within about two weeks. The transcript reports that the share of code changing within two weeks reached about 70%, while longer-horizon revisions (after one to two years) fell to about a third of earlier levels. This pattern aligns with defect-rate findings cited from Google’s report: higher AI adoption correlates with higher defect rates (one estimate: a 25% increase in AI adoption leading to ~7% more defects), consistent with code being integrated before it’s fully correct.

What role does assistant suggestion granularity (single-line vs multi-line) play in duplication?

The discussion suggests that multi-line suggestions make duplication easier because they can reproduce larger blocks quickly. If suggestions were restricted to single-line autocomplete, the assistant would be less able to automatically duplicate a 10-line block in one action. The transcript notes that such a single-line-only option was not seen as widely offered, implying current assistant UX may be structurally conducive to duplication.

Review Questions

Which measurable signals in the dataset point to worsening maintainability rather than just higher productivity?
Why does duplication within a single commit matter more than duplication observed only across longer time windows?
What incentives might cause teams to keep accepting “press Tab” workflows even when churn and defect signals rise?

Key Points

1
Churn accelerated in open-source repositories, shifting from roughly six-month revision cycles to about two-week cycles on average.
2
Multiple metrics—copy-paste, duplicated blocks, and revision speed—converge on the same story: faster code addition paired with worse refactoring.
3
In 2024, copy-pasted code overtook moved/refactored code for the first time in the measured history, marking a structural change in how repos evolve.
4
Large duplicated blocks (at least five identical lines) rose sharply, reaching 6.66% of commits in 2024 from about 0.5% in 2022.
5
Churn-linked deletions increased, with about 70% of newly changed lines being revised within two weeks, while long-horizon legacy refactors declined.
6
Higher AI adoption correlates with higher defect rates (cited estimate: ~7% more defects for a 25% increase in AI usage), consistent with faster integration of unvetted code.
7
Teams may need to redefine productivity metrics to reward reuse and consolidation, not just lines added or commits produced.

Highlights

The most alarming shift is churn: code that used to be revised over ~six months is now being revised on an average ~two-week cycle.

2024 is described as a tipping point where copy-pasted code surpasses moved/refactored code, implying maintainability is falling behind output.

Duplicated blocks surged to 6.66% of commits in 2024 (five+ identical lines already existing elsewhere), raising the odds of inconsistent updates and bugs.

Churn type deletions—code deleted within about two weeks—are accelerating alongside a reported rise in defect risk with AI adoption.

Topics

Code Quality
GitHub Copilot
Code Churn
Code Duplication
Refactoring vs Copy-Paste

Mentioned

GitHub
GitHub Copilot
Google
Visual Studio Code
Chromium
GitClear
Bill Harding
AI
IDE
UX