A group of prominent authors is suing Microsoft, alleging that the company’s Megatron artificial intelligence was trained on a vast collection of nearly 200,000 pirated books. This lawsuit spotlights the growing tension between creators and AI developers concerning the origins of the data used to power these advanced models. The authors claim that the AI was specifically developed to imitate the distinct literary styles found in their copyrighted material.
Filed in New York federal court, the complaint seeks a court order to prevent further alleged infringement by Microsoft and demands statutory damages of up to $150,000 for each work supposedly misused. The plaintiffs underscore that generative AI models, which produce various forms of media, are fundamentally reliant on extensive datasets for their learning and output. They explicitly state that the pirated dataset was vital for the AI’s ability to replicate human creative expression.
Microsoft has not yet issued a statement regarding the lawsuit, and the authors’ attorney has declined to speak on the matter. This case emerges amidst a series of other landmark copyright decisions in the AI domain, including recent rulings involving Anthropic and Meta in California.
The legal landscape surrounding AI and copyright is rapidly evolving, with a growing number of lawsuits spanning various media types. From news organizations like The New York Times and Dow Jones suing AI companies over their content, to record labels and photography companies taking action, the scope of these disputes is widening. Tech companies consistently argue for “fair use,” contending that their AI models produce “transformative” content and that stringent copyright enforcement could hinder AI innovation.