New Media Rights Executive Director Art Neill, James Thomas (CWSL ’24) and Assistant Director Erika Lee teamed up to write about an issue that many legal minds are grappling with: how does copyright law apply to the training of artificial intelligence? The rampant popularity of generative artificial intelligence in the last few years has resulted in a number of different lawsuits related to AI, including its use, training and the resulting outputs.
Our latest article, A Framework for Applying Copyright Law to the Training of Textual Generative Artificial Intelligence, was recently published in the University of Texas Intellectual Property Law Journal’s summer edition. The article delves into the complex legal landscape surrounding the use of copyrighted works to train generative artificial intelligence models. We also briefly compare how other countries are addressing the same challenges, and the implications on policy choices here in the U.S.
GenAI’s ability to produce human-readable text relies heavily on vast datasets, often including copyrighted material. The legality of whether copyrighted materials should be used in the training process is currently hotly debated in courts and regulatory bodies. Several lawsuits were launched by high-profile authors and copyright owners against OpenAI and Microsoft, claiming copyright infringement of the works used in training their respective GenAI large language models. The outcomes of these lawsuits could pose challenges to the development of AI technology in different ways.
Neil, Thomas, and Lee argue that the current copyright law framework can be applied to the training of LLMs under current U.S. copyright law, and that new technology doesn’t always necessitate new rules to handle it.
"These lawsuits highlight a fundamental tension between technological innovation and protecting the rights of creators. But it is important to understand that a legal framework to address these challenges already exists in our case precedents, and we can use that framework to both foster innovation and safeguard creator’s rights,” says Lee.
In the article, we explore the core legal issues in these lawsuits. We also explore the nature of the datasets and the kinds of works that are used in training the AI, and how those works are collected and curated. In this exploration, we address the transitory nature of the use of these works. Under U.S. copyright law, the use of unprotected works (i.e. works in the public domain) is generally permitted, but all other uses of copyrighted content are generally considered infringing unless the use is licensed or could be considered fair use.
However, it has also been well-established that certain fleeting uses of copyrighted works may be permissible and non-infringing. This is where it is critical to understand how the training process works, because in many cases, copyrightable works may not actually be copied in a material or substantial way that would rise to the level of infringement. But in situations where there is copying, then we argue that fair use would very likely apply to defend against claims of infringement in the training process.
"Our clinic has a specialty in fair use matters, and works with creative and technology clients on the frontiers of artificial intelligence. Our work gave us unique insight to write this article, which comes at an important moment, when Courts and policymakers must carefully maintain the balance within copyright law as they apply established law to a new technology,” says Neill.
By breaking down current lawsuits, examining the datasets used in AI training, and applying well-established legal precedents, we argue that much of the copying done during GenAI training can be legally defended as fair use. This balance is critical not only for the continued development of AI but also for maintaining international competitiveness in AI technologies.
You can read the full article here.