A New Git Diff Algo
Based on The PrimeTime's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Myers diff represents changes mainly as add/delete between two repo snapshots, which can inflate visual noise for moves, renames, and refactors.
Briefing
GitHub’s diff experience is getting a rethink: a newer diff strategy called “commit cruncher” is designed to cut the amount of code reviewers must visually parse by recognizing more kinds of changes than the classic Myers diff. Myers—named for Eugene Myers’ canonical algorithm—treats line differences largely as add/delete operations between two repo snapshots, which can make refactors look like large, noisy edits even when most of the work is moving code around or making trivial updates.
The research behind commit cruncher argues that expanding the diff “vocabulary” (adding operations like move, update, find/replace, and copy/paste) can produce a more compact, reviewer-friendly representation of a pull request. In practical examples, whitespace-only changes can be treated as low-signal updates so reviewers focus on meaningful edits. Refactors that extract code into a new function can be shown as a smaller set of “what changed” events rather than thousands of lines appearing as rewritten. The approach also aims to preserve context for multi-step edits—such as when a rename happens and later commits modify the renamed file—by tracing how each changed line evolves across the commit sequence.
The core workflow shift is important: Myers diff compares only the repo state before and after a commit, while commit cruncher takes a more computationally intensive route by tracking each changed line through the commits where it appears, building a “commit group” view. The payoff claimed in the research is twofold: fewer highlighted lines to review and better reviewer context. One benefit described is that hovering over a line can surface the commit messages that explain why that line ended up in its final form. Another is that when a line is moved and then modified, the diff can show the original location and evolution rather than forcing reviewers to reconstruct the story from two endpoints.
Empirical results are presented using 12,638 pull requests processed in the second half of May 2024, spanning popular open-source projects (including React, VS Code, Chromium, and TensorFlow) and SaaS repositories. Using GitHub’s API compare endpoint as a baseline, the study reports “28% fewer lines to review” on average when commit cruncher’s diff highlighting is used instead of Myers-style highlighting. A separate experiment with 48 developers assigned to review pairs of pull requests on GitHub vs the commit cruncher platform found no meaningful difference in question accuracy (with differences under 5%), while review duration decreased in the direction expected.
Still, the evidence comes with caveats. The reported “lines to review” metric is a proxy for effort, not a direct measure of comprehension difficulty or bug-finding outcomes. Review time can drop even if the underlying cognitive load shifts elsewhere, and real-world code review includes confounders like reviewer familiarity, language expertise, and the nature of the changes. Even so, the broader takeaway is clear: diff algorithms shape how humans interpret change, and moving beyond Myers’ add/delete framing could reduce the visual noise that slows reviews—especially for refactors, moves, and incremental edits.
Cornell Notes
Myers diff—the default line-diff approach behind GitHub-style comparisons—mostly reduces changes to add/delete between two repo snapshots. Commit cruncher aims to improve that by using a richer set of diff operations (including move, update, find/replace, and copy/paste) and by tracing how changed lines evolve across the commit sequence. The claimed result is less reviewer-visible “diff noise”: a study of 12,638 pull requests reported about 28% fewer highlighted lines to review on average. In a separate user study with 48 developers, question accuracy stayed essentially the same (differences under 5%), while review duration decreased. The practical implication is that diff representation can materially affect review throughput, though “fewer lines” is still an indirect measure of bug-finding quality.
Why does Myers diff often make refactors look worse than they are?
What is the key difference in how commit cruncher builds a diff?
How do richer diff operations reduce reviewer workload?
What evidence is used to claim a reduction in review effort?
Did reviewers actually understand the code better, or just review faster?
Review Questions
- How does endpoint-only diffing (before vs after) tend to misrepresent moves and refactors compared with line-tracing across commits?
- What does a “28% fewer lines to review” metric measure, and what important real-world outcomes might it fail to capture?
- Why might review duration decrease without a measurable improvement in question accuracy?
Key Points
- 1
Myers diff represents changes mainly as add/delete between two repo snapshots, which can inflate visual noise for moves, renames, and refactors.
- 2
Commit cruncher aims to reduce that noise by using additional diff operations such as move, update, find/replace, and copy/paste.
- 3
Instead of only comparing pre- and post-commit states, commit cruncher traces changed lines through the commits where they appear to build a more contextual diff view.
- 4
The research reports about 28% fewer highlighted lines to review across 12,638 pull requests, using GitHub compare output as a baseline.
- 5
A user study with 48 developers found question accuracy differences under 5% while review duration decreased, implying faster review without clear accuracy gains.
- 6
“Fewer lines to review” is a proxy for effort; it does not directly prove fewer bugs or better production outcomes.
- 7
Diff representation can change reviewer attention and interpretation, so algorithmic choices can affect developer workflow even when the underlying code change is the same.