AI
Reddit drags data scraper startups to court
Reddit’s complaint targets four defendants, with the most headline-grabbing being Perplexity AI.
Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.
Imagine this: a website tells you, “No bots allowed!” when it comes to scraping its precious text for AI training.
So, instead of sending your data-hungry robots directly to that site, you send them to Google search results, which conveniently display the same text. Is that clever? Shady? Both?
That’s the philosophical (and legal) riddle at the heart of Reddit’s new lawsuit, filed Wednesday in New York, which might just become the next big courtroom showdown in the ongoing war between content platforms and AI data scrapers.
Reddit’s complaint targets four defendants, with the most headline-grabbing being Perplexity AI, the startup already famous, or infamous, for playing fast and loose with other people’s content.
The other three, SerpApi (Texas), Oxylabs (Lithuania), and AWMProxy (Russia), allegedly used clever detours: instead of scraping Reddit directly, they scraped Google pages containing Reddit data, then packaged and resold that information to big names like OpenAI and Meta.
According to Reddit, that’s basically like stealing your neighbor’s Wi-Fi password and claiming you never touched their router.
The company is seeking damages and a permanent injunction to stop the scraping spree for good.
It’s not Reddit’s first rodeo, either. Earlier this month, LinkedIn sued ProAPIs for using fake accounts to collect user data, and Reddit had previously accused Anthropic of claiming it had stopped scraping, only to show up 100,000 more times.
Of course, winning these lawsuits isn’t easy. For one, Reddit is filed in New York, but most of these companies are conveniently not based in New York, or even in the US.
And there’s the bigger legal question: can a company really “own” public data once it’s online?
An Oxylabs spokesperson told The New York Times, “No company should claim ownership of public data that does not belong to them.”
Courts have sometimes agreed, like when a judge dismissed Elon Musk’s X’s similar case last year, warning that too much control over web data could create “information monopolies.”
So, are these AI firms digital pirates or just clever entrepreneurs navigating legal gray waters? For now, Reddit’s hoping the court says it’s the former, but the bots are definitely watching.
