AI

Cloudflare calls out Perplexity for sneaky AI scraping tactics

Perplexity claims the bot Cloudflare mentioned wasn’t theirs and didn’t actually access any content.

by
Ronil Thakkar
August 5, 2025

Perplexity Pro interface with Labs feature enabled.

Image: Perplexity

Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.

Cloudflare, a major internet infrastructure company, has accused AI startup Perplexity of secretly scraping data from websites that clearly said “no.”

According to Cloudflare, Perplexity has been ignoring digital rules set by websites to stop bots from collecting their content and even disguising its identity to get around these blocks.

At the heart of the issue is a common internet tool called a robots.txt file, basically a digital “Do Not Enter” sign that tells search engines and bots what content they’re allowed to access.

Many websites use this to block AI companies from collecting their content for training language models.

But Cloudflare says Perplexity is bypassing this system by changing the digital fingerprints (called “user agents”) of its bots and pretending to be something it’s not, like a regular web browser such as Google Chrome.

Cloudflare’s analysis, published Monday, shows that Perplexity’s scraping behavior was widespread, affecting tens of thousands of websites and generating millions of requests per day.

They say they used machine learning and network signals to track and identify Perplexity’s activity.

Perplexity denied the accusations. A spokesperson called Cloudflare’s blog post a “sales pitch” and claimed the bot Cloudflare mentioned wasn’t theirs and didn’t actually access any content. (Via: TechCrunch)

But Cloudflare pushed back, saying the behavior was real and verified through customer complaints and internal testing.

The conflict adds to growing tension between AI companies and online publishers.

Cloudflare recently launched a marketplace where websites can charge AI companies to scrape their data, and previously rolled out a free tool to block AI bots entirely.

Their CEO, Matthew Prince, has been vocal about how AI scraping could damage the business model of content creators and publishers.

This isn’t Perplexity’s first controversy.

Last year, media outlet Wired accused the startup of plagiarizing its articles, and the company’s CEO struggled to explain their stance on plagiarism when asked at a tech conference.

The fight over who controls online content in the age of AI is heating up, and Perplexity’s scraping practices are under serious scrutiny.

Do you think AI companies like Perplexity should respect websites’ robots.txt files and scraping restrictions? Or is this data scraping controversy just growing pains as AI and content creators figure out fair compensation models? Tell us below in the comments, or reach us via our Twitter or Facebook.