Am I going to jail for web scraping?

TL;DR

A Delaware court found booking.com liable under the CFAA for scraping Ryanair’s website and using the data to book tickets for profit without authorization.

Briefing Cornell Notes

Briefing

A Delaware federal court ruling found that booking.com violated the Computer Fraud and Abuse Act (CFAA) by scraping Ryanair’s website—an outcome that puts “publicly accessible” data into a more legally risky category when scraping is tied to unauthorized access and resale. The decision matters because it signals that scraping can cross from gray-area automation into federal criminal exposure, depending on how the access is obtained and what the scraper does with the data.

The dispute centered on booking.com’s extraction of Ryanair ticket information and booking it for profit without authorization. Booking.com attempted to counter-sue Ryanair for defamation after Ryanair labeled the company an “online travel agency pirate,” but the court rejected that claim. The core legal takeaway wasn’t the branding fight; it was the court’s willingness to treat the scraping conduct as actionable under the CFAA.

The transcript places this case in a broader pattern of scraping litigation. In 2015, a company called 3Taps scraped data from Craigslist for a site called PadMapper despite Craigslist blocking its IP addresses and sending a cease-and-desist letter. The court cited Craigslist’s ability to use the CFAA to protect public data, and 3Taps ultimately agreed to stop scraping and pay $1 million. The message: once a site tells a scraper to stop, continuing can trigger serious legal consequences.

Yet outcomes have not been uniform. In 2019, HighQ Labs sued LinkedIn after scraping LinkedIn data to predict when employees might leave their jobs. LinkedIn also sent a cease-and-desist letter, but the court ruled for HighQ, allowing access to LinkedIn’s public data; that decision was later affirmed by the Supreme Court. The transcript frames this as a counterweight to the Craigslist-style approach, suggesting that scraping public information may be treated differently when the conduct doesn’t involve the same kind of unauthorized access.

More recently, a lawsuit tied to AI training data also ended in a scraper-friendly direction. A judge dismissed—“with prejudice”—a case claiming GitHub Copilot violated software developers’ rights by ignoring open-source licenses when scraping code to train the tool. That dismissal, as described, means the claim couldn’t be refiled.

So will someone go to jail for web scraping? The transcript’s practical bottom line is cautious: if scraping involves publicly available data and no fraud, the odds of jail are “extremely low.” The bigger risk, it warns, is civil litigation—especially from large corporations that can impose crushing legal costs. In short, the legal boundary appears less about whether data is visible in a browser and more about authorization, intent, and whether a site has demanded the scraping stop.

Cornell Notes

A Delaware court ruled that booking.com violated the CFAA by scraping Ryanair’s website, highlighting that “publicly accessible” web data can still create federal legal risk when access is unauthorized and tied to profit. The transcript contrasts this with earlier cases where scraping public data was treated more leniently, including HighQ Labs v. LinkedIn (public data access allowed and affirmed by the Supreme Court). It also notes a separate AI-related win: a judge dismissed a GitHub Copilot licensing lawsuit with prejudice. Overall, the practical guidance is that jail risk is low for scraping public data without fraud, but the financial and legal exposure from lawsuits can be severe—especially after a site blocks or demands you stop.

Why did booking.com’s scraping of Ryanair become a CFAA problem?

The court found booking.com violated the Computer Fraud and Abuse Act by scraping Ryanair’s website and then using that information to book Ryanair tickets for profit without authorization. The case also included a failed defamation countersuit by booking.com after Ryanair used terms like “online travel agency pirate,” but the legal focus remained on the scraping and unauthorized use.

What happened in the 3Taps vs. Craigslist dispute, and what precedent did it reinforce?

In 2015, 3Taps scraped Craigslist data for PadMapper even after Craigslist blocked its IP addresses and sent a cease-and-desist letter. The court cited Craigslist’s ability to use the CFAA to protect public data, and 3Taps later agreed to stop scraping and pay $1 million—reinforcing that ignoring a stop order can escalate risk.

How did HighQ Labs v. LinkedIn differ from the Craigslist-style outcome?

HighQ Labs scraped LinkedIn data to predict when employees would leave their jobs, and LinkedIn also sent a cease-and-desist. Unlike the Craigslist outcome, the court ruled for HighQ, allowing access to LinkedIn’s public data; that ruling was affirmed by the Supreme Court. The transcript treats this as evidence that scraping public information may be treated differently when unauthorized access and intent are not handled the same way.

What does the GitHub Copilot licensing dismissal suggest about scraping for AI training?

A lawsuit claiming GitHub Copilot violated developers’ rights by scraping open-source code to train the tool was dismissed by a judge with prejudice. In the transcript’s framing, that outcome is a significant win for scrapers building AI tools that rely on open-source code, because the claim couldn’t be refiled.

If data is visible in a browser, what still determines legal risk?

Visibility alone doesn’t settle legality. The transcript emphasizes factors like authorization, whether the site blocks or demands stopping, and whether there’s intent to defraud or unauthorized resale. It also notes that terms of service and robots.txt may not technically stop scraping, but site actions like IP bans can raise the stakes if scraping continues.

Review Questions

Which factors in the booking.com vs. Ryanair case made the scraping legally risky under the CFAA?
How do the outcomes in the Craigslist and LinkedIn disputes differ, and what does that imply about scraping public data?
What does a “dismissed with prejudice” outcome mean for future similar claims, based on the GitHub Copilot example?

Key Points

1
A Delaware court found booking.com liable under the CFAA for scraping Ryanair’s website and using the data to book tickets for profit without authorization.
2
Defamation countersuits tied to scraping disputes can fail even when the underlying scraping conduct is the main legal issue.
3
Ignoring a cease-and-desist and continuing to scrape after IP blocking can trigger CFAA exposure, as shown by 3Taps’s $1 million settlement.
4
Scraping public data isn’t automatically illegal; HighQ Labs v. LinkedIn allowed access to public information and was affirmed by the Supreme Court.
5
AI training cases may turn on licensing and claim viability; a GitHub Copilot-related lawsuit was dismissed with prejudice, preventing refile.
6
Even when jail risk is low for non-fraud public-data scraping, civil lawsuits from large companies can create severe financial consequences.

Highlights

The booking.com ruling treated unauthorized scraping of Ryanair’s site as CFAA-violating conduct, not merely a terms-of-service dispute.

3Taps’s continued scraping after Craigslist blocked its IP addresses led to a $1 million resolution and an explicit CFAA precedent.

HighQ Labs won access to LinkedIn’s public data, and that outcome was upheld by the Supreme Court.

A GitHub Copilot licensing lawsuit was dismissed with prejudice, signaling a strong procedural end to that line of claims.

Topics

Web Scraping Law
CFAA
Computer Fraud and Abuse Act
Robots.txt
AI Training Data

Mentioned

booking.com
Ryanair
PadMapper
GitHub Copilot
LinkedIn
Craigslist
CFAA