How does AI detection work? I designed an EXPERIMENT to show you

TL;DR

AI detectors score full-document context, not isolated sentences or words.

Briefing Cornell Notes

Briefing

AI detection systems don’t judge text sentence-by-sentence; they score documents by looking at full-context patterns. That matters because a workflow that “humanizes” or rewrites sections until a free checker reports 0% can still trigger a high AI score later when a submission tool (such as Turnitin) scans the entire paper at once.

The core mechanism described is context dependence. Detectors analyze the whole document—especially how different passages appear together—rather than treating each paragraph as an isolated unit. This means proximity effects can occur: human-written sections can be flagged as AI-generated when they sit near AI-written sections or when AI-like patterns cluster across boundaries. The practical takeaway is straightforward: feed the detector the full document (or at least all AI-generated text together), not fragmented chunks.

A key warning targets a common workaround: splitting long text into pieces to fit the word limits of free online detectors, then running each chunk separately and pasting the “cleaned” results back into the original document. According to the explanation, this changes what the detector can see. If a free tool only receives half the text, it can’t evaluate cross-document patterns that emerge only when the complete set of passages is present. When the fully assembled document is later scanned by a tool that has access to 100% of the content, the scoring can swing dramatically.

To demonstrate the effect, an experiment was run using a hypothetical “findings” chapter generated by ChatGPT (spelled “Chad GBD” in the transcript). The generated chapter was first placed into Microsoft Word, then split into two parts because it was too long for the free detector being used. Each half was scanned and edited until the free tool reported 0% AI-generated content for that section.

After both halves were pasted back into a single document, a subsequent scan using a detector that considers the full document produced the opposite result: the combined submission was flagged at 100% AI-generated content. The explanation attributes the reversal to pattern detection across the entire text—patterns that the chunk-based workflow prevented the first tool from observing.

Beyond context, the transcript also describes how detectors rely on large language corpora containing millions of human-written and AI-generated documents. Detectors compare expressions against these datasets and compute likelihoods for combinations of phrasing appearing together, even when individual phrases look ordinary. The math is described as too complex for most people to replicate manually.

The bottom line is not that rewriting is pointless, but that evaluation depends on what the detector is allowed to see. For longer assignments, the safest approach is to provide full context to the same type of checker used for submission, because fragmenting text can create a false sense of success—and then a later scan can flag the assembled work as highly AI-generated.

Cornell Notes

AI detection scores depend heavily on full-document context, not isolated sentences or paragraphs. When long text is split to fit a free detector’s word limit, the tool only sees part of the patterns that emerge across the entire document. In a described experiment, a ChatGPT-generated hypothetical findings chapter was divided into two halves, edited until a free online checker reported 0% AI content for each half, and then pasted back together. Once the combined document was scanned with a detector that analyzed the whole text, it was flagged at 100% AI-generated content. The lesson is to avoid chunk-and-repaste workflows and instead evaluate using the full context (or all AI-generated sections together) to match how submission tools score papers.

Why can a document show 0% AI content in a free checker but later be flagged as highly AI-generated?

Because the scoring is context-dependent. Free tools with word limits may only analyze a portion of the text (e.g., half). After chunk-by-chunk editing, the assembled document contains cross-passage patterns the first tool never saw. When a submission-style detector scans the full document, it can detect those combined patterns and the score can jump dramatically.

What role does “proximity” play in AI detection?

Human-written sections can still be flagged as AI-generated when they sit near AI-generated sections. The transcript attributes this to how detectors evaluate patterns across the document, so passages close to AI-like text may inherit the same statistical signals even if they were written manually.

What was the structure of the experiment described?

A hypothetical findings chapter was generated by ChatGPT (spelled “Chad GBD” in the transcript), pasted into Microsoft Word, and split into two parts due to length limits. Each part was scanned in a free online detector and edited until it reached 0% AI-generated content. Both parts were then pasted back into one document, and a full-document scan produced a 100% AI-generated result.

How do detectors use large datasets to score text?

Detectors compare expressions against huge corpora containing millions of human-written and AI-generated documents. Even if phrases seem common individually, the system estimates the likelihood of particular combinations appearing together in the same document. That cross-phrase likelihood is part of the scoring.

What workflow should be avoided for longer assignments?

Avoid breaking AI-generated content into separate pieces, running detection on each piece independently, and then pasting the results back together. The transcript argues this changes the detector’s available context and can lead to a mismatch between “cleaned” chunk scores and the final full-document score.

Review Questions

How does limiting a detector to part of a document change what patterns it can detect?
What does the chunk-and-repaste experiment suggest about relying on free AI checkers for final submission decisions?
Why might human-written text still be flagged when it appears near AI-generated passages?

Key Points

1
AI detectors score full-document context, not isolated sentences or words.
2
Splitting text to fit a free detector’s word limit can prevent detection of cross-passage patterns.
3
A chunk-by-chunk workflow can produce misleading results when a later tool scans the entire assembled document.
4
Human-written sections may be flagged due to proximity to AI-generated sections and shared document-level patterns.
5
Detectors rely on large language corpora and compute likelihoods for combinations of phrasing, not just single phrases.
6
For longer papers, evaluate using the full context (or all relevant AI-generated sections together) to match submission-style scanning.

Highlights

Chunking text for free detection can create a false sense of safety: each half can hit 0%, then the combined document can be flagged at 100%.

AI detection is described as context-driven, so proximity between AI-like and human-written passages can affect scores.

Detectors use massive datasets of human and AI writing to estimate how likely specific combinations of expressions are.

The transcript emphasizes that what the detector can see (50% vs 100% of the text) can determine the final score.

Topics

AI Detection
Turnitin
Context Dependence
Word Limits
Text Chunking