AI
Adobe’s AI faces a copyright reality check
Adobe’s AI education may have involved a little too much “borrowed” reading material.
Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.
Adobe has gone all-in on AI, launching tools, sprinkling “generative” on everything, and rolling out Firefly, its shiny AI-powered creative suite, to prove it can also do the future.
But this week, Adobe’s AI ambitions wandered into familiar legal quicksand: a lawsuit accusing the company of training one of its AI models on pirated books.
The proposed class-action lawsuit was filed on behalf of Elizabeth Lyon, a nonfiction author from Oregon, who claims Adobe used unauthorized copies of her books to train an AI model called SlimLM.
According to the complaint, Adobe’s AI education may have involved a little too much “borrowed” reading material.
SlimLM, as Adobe describes it, is a small language model designed for document assistance tasks, particularly on mobile devices.
Adobe says the model was trained using SlimPajama-627B, an open-source dataset released by Cerebras in mid-2023.
That sounds reassuring, until you read the lawsuit’s footnotes, where things get spicy.
Lyon’s lawyers argue that SlimPajama is essentially a remix of another dataset called RedPajama, which itself allegedly includes Books3: a massive collection of around 191,000 books that has haunted AI companies like a copyright poltergeist.
According to the lawsuit (first reported by Reuters), SlimPajama contains derivative copies of Books3, and therefore copyrighted works, including Lyon’s.
Books3 has become the “you again?” of AI litigation.
It’s popped up in lawsuits against Apple over its Apple Intelligence platform and against Salesforce for allegedly training AI systems on copyrighted material without permission, credit, or compensation.
At this point, if your dataset includes Books3, a lawyer may already be drafting a complaint.
Sadly for the tech industry, this is no longer shocking news. Training modern AI requires enormous datasets, and those datasets sometimes wander into legally questionable territory.
Just last year, Anthropic agreed to pay $1.5 billion to authors who accused it of using pirated works to train its chatbot, Claude, a settlement widely seen as a warning shot.
For Adobe, the lawsuit is another reminder that AI may be fast, powerful, and creative, but copyright law still reads everything very, very carefully.
