What's Behind the ChatGPT History Change? How You Can Benefit + The 6 New Developments This Week

TL;DR

ChatGPT’s “turn off chat history” setting is described as applying only to conversations started after the change, not necessarily to older chats.

Briefing Cornell Notes

Briefing

A new ChatGPT setting that lets users “turn off chat history” is drawing attention less for privacy optics and more for what it may signal about OpenAI’s data practices—and the legal and regulatory pressure that could reshape how AI companies collect training material. The change, attributed to Sam Altman’s recent tweet, allows users to disable chat history and training for conversations started after the switch. But existing chats appear to remain eligible for training by default, and the interface links “chat history” and “training” into a single on/off choice rather than offering a clean separation between keeping personal records and opting out of model improvement.

The practical takeaway is immediate: users can check the setting via the three-dots menu in a ChatGPT conversation, then go to Settings and the relevant “data controls” area. If chat history is disabled, OpenAI still monitors chats for “abuse,” even though the user is opting out of providing data for training. For people who want to retain their chat history while avoiding training, the transcript suggests that’s not straightforward—because the product design effectively forces a tradeoff. There is also an opt-out form, but it comes with a warning that it may limit how well models address a specific use case.

Amid the privacy controversy, the feature may offer one clear benefit: an “export data” option buried in the settings that triggers an email containing a link to download a full conversation history archive. That export can be searched across all chats from the start of use to the present, giving users a concrete way to audit what they typed.

Why the timing matters: the transcript ties the announcement to looming compliance deadlines under Europe’s GDPR. An MIT Technology Review report is cited describing how it may be impossible for OpenAI to fully comply because of how AI training data is collected before any user opts in. The stakes are framed as existential—potential bans, large fines, and even orders to delete models and training data if regulators conclude the data use was illegal. Regulators across jurisdictions “from Brazil to California” are expected to watch the outcome, with the implication that GDPR-style enforcement could become a global template.

Beyond regulation, the transcript points to a widening web of data sourcing disputes and copyright claims. Examples include harvesting pirated ebooks from a site formerly known as “book ZZ” (with content still present in Common Crawl), use of Common Crawl and “The Pile” (which includes pirated books and other sensitive material), and the possibility that training sets may contain unexpected content—such as benchmark data mixed in “inadvertently,” according to a GPT-4 technical report footnote. It also raises the question of whether data contributors get paid: Reddit users, Wikipedia editors, and Stack Overflow contributors are all mentioned as potential stakeholders, with claims that major platforms are moving toward charging AI companies for training data.

The transcript then shifts to litigation and future harm. Lawsuits involving Microsoft, GitHub, and OpenAI are described, including arguments about whether plaintiffs can prove injury from tools like GitHub Copilot. The concern is that proving job loss could become easier as AI systems replace more workers, but that blocking access to a model could also create new arguments about who is “injured.” In parallel, publishers and journalists are referenced through claims that proprietary content is being used without compensation.

Finally, the transcript speculates about how this could evolve: OpenAI’s data spend might fall if models generate synthetic training data or reduce reliance on human feedback loops. Yet the immediate reality remains that users are being offered a more granular control surface while regulators and courts push AI firms toward stricter accountability for where training data comes from and who benefits from it.

Cornell Notes

ChatGPT’s new “turn off chat history” control is presented as a privacy lever, but it appears to be tightly coupled to training: conversations started after disabling history won’t be used for training, while older chats may still be eligible by default. The transcript argues this design makes it harder to keep chat logs while opting out of training. It also highlights an “export data” feature that lets users download and search their full conversation history, offering a practical audit tool. The broader context is regulatory risk under GDPR, where regulators could impose bans, fines, or even orders to delete models and training data. Copyright and scraping disputes—along with lawsuits and questions about whether contributors get paid—are framed as the next pressure points for AI companies.

What does “turn off chat history” actually change for training, and what doesn’t it change?

The control applies to conversations started after chat history is disabled. Those new chats are not supposed to be used to train and improve OpenAI’s models. Existing conversations, created before the setting was changed, are described as still being used by default for training new models. The transcript also notes that chat monitoring for “abuse” continues even when history/training is disabled.

Why is the setting described as less flexible than it sounds?

Instead of offering separate toggles for “store chat history” and “opt out of training,” the transcript describes a single linked choice: it’s effectively “both or neither.” That means users who want to keep their chat history for later review may not be able to fully opt out of training through the same interface.

What practical tool does the transcript highlight for users who want to audit their data?

An “export data” button in the settings triggers an email with a link to download a data export containing conversation history. After downloading and opening the file, users can search through prior conversations—described as including essentially all chats from the time they first used ChatGPT up to the present.

How does GDPR enforcement enter the picture, and what could happen if compliance fails?

The transcript links the timing to GDPR deadlines and argues compliance may be difficult because training data is collected before users can meaningfully opt in. It cites the possibility of regulators defining OpenAI’s data practices as potentially illegal, leading to outcomes such as bans, hefty fines, and even forced deletion of models and the training data used to build them.

What kinds of training-data disputes are raised beyond GDPR?

The transcript points to copyright and scraping controversies: Common Crawl use, pirated ebook sources (including a site formerly known as “book ZZ”), and “The Pile,” which is described as containing pirated books and other sensitive material. It also raises the idea that training sets may include unexpected benchmark content mixed in “inadvertently,” and it questions whether contributors like Reddit users, Wikipedia editors, and Stack Overflow contributors receive compensation.

Why do lawsuits hinge on proving “injury,” and how might that change over time?

One described challenge is that plaintiffs may struggle to show personal harm from tools like GitHub Copilot, with defendants arguing claims rely on hypothetical events. The transcript suggests that as AI replaces more jobs, plaintiffs could more clearly demonstrate injury (e.g., job loss tied to a specific tool). It also notes new complications if access to a model is blocked, potentially widening who can claim harm.

Review Questions

How does the transcript distinguish between disabling chat history for future conversations versus opting out for past conversations?
What regulatory remedies under GDPR does the transcript say could extend beyond fines to model or data deletion?
Which examples are used to question whether training-data contributors are compensated, and what legal mechanism is implied to make compensation more likely?

Key Points

1
ChatGPT’s “turn off chat history” setting is described as applying only to conversations started after the change, not necessarily to older chats.
2
The transcript claims chat history and training are linked into a single choice, making it harder to keep logs while opting out of training.
3
An “export data” option provides a way to download and search a user’s full conversation history via an emailed link.
4
GDPR enforcement is framed as a major driver of urgency, with potential outcomes including bans, large fines, and possible deletion of models and training data.
5
The transcript lists multiple training-data controversies involving Common Crawl, pirated ebook sources, and “The Pile,” plus concerns about unexpected benchmark mixing.
6
A recurring theme is whether data contributors (Reddit, Wikipedia, Stack Overflow) receive compensation as AI companies monetize training.
7
Litigation is portrayed as hinging on proving “injury,” which may become easier as AI tools more directly displace jobs.

Highlights

The transcript portrays the new setting as a tradeoff: disabling chat history appears to also disable training for new conversations, while older conversations may still be used by default.

An “export data” button can generate an email link to download a searchable archive of essentially all past ChatGPT conversations.

GDPR is presented as potentially capable of forcing not just fines but bans and deletion of models and training data if regulators find data practices illegal.

Topics

ChatGPT History Controls
GDPR Compliance
Training Data Controversies
Copyright and Scraping
AI Lawsuits

Mentioned

Sam Altman
Melissa iquila
GDPR
EU