Apple Faces Copyright Claims Over AI Training Data Amid Open Source Disputes
On March 18, Apple was once again named as a defendant in a copyright infringement lawsuit filed by Chicken Soup for the Soul, LLC. The suit alleges that Apple utilized "The Pile" dataset—which contains pirated books—for artificial intelligence training. This extensive litigation also targets other global technology leaders, including Meta, xAI, Google, Anthropic, OpenAI, Perplexity, and NVIDIA. At the heart of the case is the "Books3" shadow library module within the dataset, which houses a vast collection of copyrighted literary works.

In response to the claims, Apple reiterated its commitment to developing AI datasets legally and ethically since 2024. While Apple researchers did use "The Pile" dataset in the open-source project OpenELMs, the company clarified that this was solely for public research and was not employed in its core Apple Intelligence system. Legal analysts, however, point out a potential complication: since Apple's foundational model received assistance from Google Gemini, Apple could face intricate joint liability if Google is found to have breached regulations, due to their technical supply chain relationship.
Currently, companies such as Perplexity have defended their web scraping practices, while Apple continues to emphasize the transparency and compliance of its own model training. As AI regulations become more stringent, this class-action lawsuit, which targets the foundational training data, represents a significant escalation in creators' pushback against what they see as "data exploitation" by tech giants. It is also expected to compel the industry to reassess the compliance costs and technical limits of implementing "data traceability" in model development.
Related article
Bain forecasts US$100 billion SaaS market in agentic AI automation
Bain & Company has estimated a $100 billion market in the U.S. for SaaS companies leveraging agentic AI. The firm said this market stems from automating coordination tasks within enterprise systems.This estimate comes from the second installment in B
AI Search Mandatory Policy Fuels Exodus, DuckDuckGo Sees User Surge
Following Google's 2026 I/O conference announcement of a full AI overhaul of its search engine, many users started looking for more controllable alternatives because there was no simple "one-click disable" for AI features. The privacy-focused search
Xiaohongshu Restructures: Conan Named President, Creates AI Primary Department Dots and Overseas Division Rednote
On April 30, Xiaohongshu sent an internal memo to all employees announcing the launch of a new organizational restructuring. The core of this change involves fully integrating three business lines—community, e-commerce, and commercialization—along wi
Related Special Topic Recommendations
Comments (0)
0/500
On March 18, Apple was once again named as a defendant in a copyright infringement lawsuit filed by Chicken Soup for the Soul, LLC. The suit alleges that Apple utilized "The Pile" dataset—which contains pirated books—for artificial intelligence training. This extensive litigation also targets other global technology leaders, including Meta, xAI, Google, Anthropic, OpenAI, Perplexity, and NVIDIA. At the heart of the case is the "Books3" shadow library module within the dataset, which houses a vast collection of copyrighted literary works.

In response to the claims, Apple reiterated its commitment to developing AI datasets legally and ethically since 2024. While Apple researchers did use "The Pile" dataset in the open-source project OpenELMs, the company clarified that this was solely for public research and was not employed in its core Apple Intelligence system. Legal analysts, however, point out a potential complication: since Apple's foundational model received assistance from Google Gemini, Apple could face intricate joint liability if Google is found to have breached regulations, due to their technical supply chain relationship.
Currently, companies such as Perplexity have defended their web scraping practices, while Apple continues to emphasize the transparency and compliance of its own model training. As AI regulations become more stringent, this class-action lawsuit, which targets the foundational training data, represents a significant escalation in creators' pushback against what they see as "data exploitation" by tech giants. It is also expected to compel the industry to reassess the compliance costs and technical limits of implementing "data traceability" in model development.
AI Search Mandatory Policy Fuels Exodus, DuckDuckGo Sees User Surge
Following Google's 2026 I/O conference announcement of a full AI overhaul of its search engine, many users started looking for more controllable alternatives because there was no simple "one-click disable" for AI features. The privacy-focused search
Xiaohongshu Restructures: Conan Named President, Creates AI Primary Department Dots and Overseas Division Rednote
On April 30, Xiaohongshu sent an internal memo to all employees announcing the launch of a new organizational restructuring. The core of this change involves fully integrating three business lines—community, e-commerce, and commercialization—along wi





Home






