Meta Used Copyrighted Books for Training Its LLaMA Model, Authors File Lawsuit

NISHANT TIWARI 13 Dec, 2023 • 2 min read

Meta Platforms, formerly Facebook, finds itself entangled in legal turbulence as renowned figures like comedian Sarah Silverman and Pulitzer Prize winner Michael Chabon, among others, unite against the tech giant. The allegations suggest that Meta utilized copyrighted books, despite warnings from its legal team, to train its artificial intelligence models, sparking a contentious battle between content creators and the company. This unfolding story, first reported by Reuters, reveals a clash between Meta and the creators whose works allegedly fuel its AI advancements.

Allegations and Legal Turmoil

Mounting legal challenges surround Meta as it faces accusations of employing thousands of pirated books for training its AI models. The controversy came to light through recent court filings linked to a copyright infringement lawsuit, as reported by Reuters, shedding light on a clash between prominent authors and the tech behemoth. Despite warnings from its legal team, Meta proceeded with the utilization of the contentious dataset, further exacerbating the legal quagmire.

United Opposition from Creators

Comedian Sarah Silverman, Pulitzer Prize winner Michael Chabon, and other notable authors unite to assert that Meta unlawfully used their works. This is to train its artificial-intelligence language model, Llama. The latest legal submission consolidates these claims, as reported by Reuters. It adds weight to the allegations and raises questions about the ethical use of intellectual property in the tech industry.

Discord Logs and Legal Debates

A crucial aspect of the legal filing includes chat logs from a Meta-affiliated researcher discussing the acquisition of the controversial dataset in a Discord server. These logs serve as potential evidence, indicating Meta’s awareness of potential legal infringement related to the usage of the book files. The conversations reveal internal debates within Meta regarding the permissibility of employing the dataset. This highlights the company’s apparent acknowledgment of legal uncertainties surrounding the matter.

The release of Meta’s Llama large language model reportedly trained on the contentious dataset, has triggered concerns within the content creator community. Tech companies are increasingly facing lawsuits alleging unauthorized use of copyrighted material to fuel AI advancements. The outcome of these legal battles, as covered by Reuters, could significantly impact the future landscape of generative AI, with potential ramifications on the cost and transparency of building data-hungry models.

Our Say

In navigating the intersection of technological advancement and intellectual property rights, Meta is at the forefront of a complex legal battle. The alleged unauthorized use of copyrighted books raises ethical questions. Tech giants have a responsibility to respect the intellectual contributions of content creators. As the legal proceedings unfold, the tech industry awaits potential precedents. These could shape the future of AI development and the relationship between technology and creativity.