The Journal of Things We Like (Lots)
Select Page
Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley & Percy Liang, Foundation Models and Fair Use, available at SSRN (Mar. 27, 2023).

ChatGPT, Midjourney, and Copilot are among the numerous generative AI systems launched in the last year or so. They have attracted a huge number of users as well as several lawsuits. Among the lawsuits’ claims are that the makers of these systems are direct and indirect infringers of copyright because of their use of millions of in-copyright works as training data and because outputs of these generative AI programs are infringing derivative works.

At the core of these AI systems are foundation models on which the authors focus in their fascinating new article. They define this term as “large pre-trained machine learning models that are used as a starting point for various computational tasks,” including generative AI systems that may produce text, images, and/or software code in response to user prompts. The article identifies various actors who contribute to elements of these AI systems, including data creators, data curators, model creators, model deployers, and model users.

Those of us who are intent on understanding the legal implications of generative AI systems must, of necessity, be prepared to learn about the technology underlying these systems. Fortunately, these six Stanford researchers—some in computer science and some in law (our own redoubtable Mark Lemley among them)—have provided an essential guide for intellectual property and technology law scholars to the development and deployment of these systems. The article explores the extent to which developers and deployers of generative AI systems may rely on fair use to justify their use of in-copyright works as training data and how developers may limit their potential liability for infringements at the output stage.

For many copyright scholars, the article’s discussion of the fair use cases will be familiar, but the application of these precedents in the context of generative AI will be particularly useful. Yes, of course, the Authors Guild v. Google and iParadigms decisions suggest that computational uses of in-copyright materials can be fair use, but other decisions such as Associated Press v. Meltwater and Fox v. TVEyes suggest that much will depend on the particular uses that generative AI systems make of the in-copyright materials.

Foundation Models is not an advocacy article asserting that all uses of in-copyright works (or at least all that can be found on the open internet) as training data is fair use. Nor does the article argue that all outputs should be non-infringing so long as the outputs are not verbatim copies of the contents of specific training data. It offers a much more nuanced perspective about the challenges for system developers in understanding how to model computationally the degree of “transformativeness” that may be achieved by a second comer’s use of copyrighted works, as well as how to distinguish facts and expressions within those works.

The most novel section of Foundation Models is its discussion of technical strategies that AI system developers can employ to reduce the risk of copyright infringement when generative AI produces outputs in response to user prompts. These include data and output filters to detect similarities between the input data and outputs generated by the systems. Some technical mitigation strategies the authors describe must be done at the training data stage, while others, including data and output filters, can be done at the deployment stage.

The article discusses the Field v. Google decision for its recognition that Field had not used the “robots.txt” exclusion standard as a technique to stop Google from webcrawling his site. This consideration weighed against Field’s copyright claim that the search engine infringed by copying his content on that site. Foundation Models suggests that generative AI system developers and deployers would be well-advised to adopt one or more technical mitigation strategies to bolster their fair use claims.

While this article is well worth reading on the merits, it is also a noteworthy contribution to an emerging literature in which computer scientists and lawyers collaborate to explore technology law and policy issues. While not written in perhaps the most scintillating prose, this article is an outstanding example of a successful collaboration to explore ways in which technologists and lawyers can work together to co-evolve practical ways to achieve socially desirable outcomes.

Download PDF
Cite as: Pamela Samuelson, Generative AI Meets Copyright, JOTWELL (July 11, 2023) (reviewing Peter Henderson, Xuechen Li, Dan Jurafsky, Tatsunori Hashimoto, Mark A. Lemley & Percy Liang, Foundation Models and Fair Use, available at SSRN (Mar. 27, 2023)), https://ip.jotwell.com/generative-ai-meets-copyright/.