In our modern communication environment, conventional wisdom very swiftly captures and narrows our channels of thought. This is due in no small part to the unceasing production of commentary, which means that every perspective on any important new issue is made available and explored (to use the digital age’s most lamentable neologism) in “real time.”
That is true already of the copyright debates around AI. In particular, it has already become conventional wisdom, and the starting point for discussion, that the use of datasets containing copyrighted works for purposes of training AI models involves reproduction of unauthorized copies of those works, and so is a prima facie infringement. The lawfulness of AI training, it is then said, can be established only by determining whether that activity constitutes fair use.
Oren Bracha’s new article, The Work of Copyright in the Age of Machine Reproduction,1 focused on copyright implications of AI, seeks to upset this conventional view. The process of using datasets of in-copyright works for AI training purposes, Bracha argues, does not implicate copyright’s reproduction right at all, at least if the concept of a “copy” is properly understood and informed by copyright’s essential commitments. The reason it does not involve the production of a copy, flows, for Bracha’s purposes, from copyright’s fundamental rule against propertizing ideas and other unprotectable elements in copyrighted works.
Bracha explains why many copyright sophisticates have so readily concluded that the mass copying of copyrighted works to train AI is prima facie infringement. His shorthand characterization of this argument is “a copy is a copy is a copy.” It rests on an assumption that the unauthorized copying of an entire copyrighted work must support at least a prima facie case of copyright infringement, regardless of the purpose for which the copy was made. In this view, infringement lies based on the fact that a copy was made, simpliciter. The “why” becomes relevant only later when making a fair use analysis. Courts routinely treat fair use as a defense, the burden on which is with the defendant, even if the Copyright Act explicitly directs otherwise.2
Bracha debunks this “a copy is a copy is a copy” argument as a “physicalism” fallacy and shows its inconsistency with “the basic purpose of the [copyright] field.” He argues that this purpose “is grounded in the production and use dynamics of expression and expression alone.” And hence, “[p]hysical facts—whether the making of physical objects, their display, or transfer of possession in them—are never relevant in themselves.” They are relevant “only to the extent they involve in some way the enjoyment by relevant actors of the use value of expression.”
This is why Bracha maintains that the conventional wisdom about physical copying is wrong. A “copy” made for the purpose of training AI is not the kind of copy that can infringe copyright’s reproduction right: “Making a new physical copy when the expression embodied in it will be experienced by no one is not any more relevant for copyright than using an existing copy as a doorstop.”
Moreover, the copyright irrelevance of training copies, Bracha insists, does not depend on circumstances: “Mere physical reproduction, delinked from enjoyment of the expressive value of a work and completely incidental to accessing the meta-knowledge of acquiring skill, is categorically placed outside of copyright’s domain.”
One virtue of Bracha’s argument is that it does not rest solely on theory. It attempts to ground itself instead on core copyright doctrine. Bracha argues, plausibly, that “the meta-knowledge of acquiring skill” (i.e., the thing that we want AI to “learn” when we copy copyrighted works for the purpose of AI training) belongs in the category of ideas, not expression.
Put differently, studying a work to understand its “style,” or the conventions it deploys, is, at bottom, nothing more than the discernment and consumption of ideas about that work and others like it. And if access to the work’s expression is necessary to access the work’s meta-knowledge “ideas”, then the expression is subject to merger; that is, all expression in the copyrighted work would then merge into the meta-knowledge idea. Copyright’s merger doctrine concerns copyrightability, and is not just a defense to infringement. As a consequence, the plaintiff’s infringement claim, Bracha argues, simply fails to launch. For that reason, the fate of AI training need not, Bracha argues, rest on the “slender shoulders” of the fair use doctrine.
There is much more in the article that, in similar style, attempts to upend the conventional wisdom about copyright and AI. I’ll leave those parts to this jot’s readers. I for one admired Bracha’s bold article even though I suspect his arguments are unlikely to convince courts mired in copyright formalism.
In the end, the conventional wisdom is likely to be durable. That doesn’t mean it’s right.
- I appreciate the shout-out in Bracha’s title to Walter Benjamin’s seminal 1935 essay The Work of Art in the Age of Mechanical Reproduction.
- See 17 U.S.C. § 107 (“Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work … is not an infringement of copyright.”)
A combination of wishful thinking and doctrinal hash. Of course copies are made. All this argument does is conflate “copy” with the policies of fair use in an analytically sloppy way.
Defendants in the ReDigi case make an argument about the meaning of reproduction that is analytically similar in some ways to Bracha’s argument about what a copy is in this context. Both arguments are purposive interpretations of the relevant legal terms. There’s nothing crazy about that. It’s not the way courts have typically approached these terms … but as academics that doesn’t stop us from pointing out alternatives, right?
It is an amazing paper by Professor Bracha which he presented at the 2024 Annual International Intellectual Property Lecture- and explained in a very crisp and clear way. ESpecially through simple analogies that drive the point home. Here is the link- https://www.youtube.com/watch?v=_xkX6VOcjO4&ab_channel=CambridgeLawFaculty
Shoutout to Prof. Talha Syed from Berkeley Law- who is cited in the paper and even referenced in the talk for developing this point on de-physicalising as a tool of dereifying IP law in context of patent theory in his wonderful piece- Reconstructing Patent Eligibility- https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3699014
Thank you Chris for a very acute review that summarizes the central point of the article in a much clearer way than I ever could. I know that an author’s response is not the norm here, but I’ll hazard one anyway because the argument indeed goes against deeply ingrained conventional wisdom and as a result various parts of it are often misunderstood (although not by Chris here, I hasten to add). So to clarify a few things:
1. The argument is NOT that there is no reproduction or that copies are not made. Of course, there is. The argument is that such reproduction is not-infringing because it does not involve any copyrightable subject matter which is expression and expression only. Just as the copying in Nichols was not infringing because it involved using only unprotected “ideas.”
2. Once the argument is understood, the only distinction from cases like Nichols is that here, as a physical matter, the whole “work” is reproduced. But this is exactly the physicalist fallacy. Mere physical objects or acts that have nothing to do with the use of expression qua expression are not within the domain of copyright.
3. The physicalist fallacy is widespread in copyright and IP more generally. Think not only digital fair use referred to by Chris, but also the RAM copies debacle. The latter Jessica Litman very appropriately called “copy fetishism.”
For the ubiquity of the physicalist fallacy in IP generally and the need to avoid it see Talha Syed’s brilliant article on Reconstructing Patent Eligibility. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3699014
See also Abraham Drassinower in his excellent book The Problem with Copyright (the subject matter of copyright is not stains of ”ink on a piece of paper”), and Remarks on Technological Neutrality in Copyright Law as a Subject Matter Problem: Lessons from Canada. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4099720
4. Since the issue is one of copyright’s subject matter, the correct conceptual tool for analyzing it is copyright’s subject matter principles. And the comparative advantages of using the correct conceptual tool is not just the practical ones of a more administrable and procedurally-friendly to defendants doctrine, but also much more importantly: far better fit between the relevant concepts and their purpose and the substantive issue at hand. Hence the article’s argument may be good or bad, but it is anything but “doctrinal.”
5. At the same time, the argument is decidedly not about “policy” (of fair use or otherwise). It is not about this or that specific “policy” as applied to the GenAI context. It is about the most fundamental concepts of what copyright is about understood (as they must be) in light of their purpose. The whole theoretical assumption of the article is that there is such a thing as legal concepts, that are important and distinct from both: a) “doctrine” in the sense of either endlessly manipulable technical rules or formalistic norms that carry their own meaning with no reference to purpose, and; b) context-specific policies, as opposed to much more general and basic purposes that are indispensable for giving meaning to concepts. In this case the relevant concepts are those of subject matter, their purpose is to designate the field’s informational domain (expression and expression only!). And much follows from taking these concepts and their purpose seriously both for the GenAI infringement debate and more generally for IP (see (3) above). Not the least by way of dispelling very common misunderstandings and confusions that haunt IP.
Eager to read the article. Was MAI Systems v. Peak Computer with its “RAM copy” the original sin in this tale? That thought has been niggling me for years.
The “sin” of physicalism goes as far back as IP. Some examples that pop to mind are:
Boulton & Watt v. Bull (1795): all the judges but one cannot bring themselves to recognize the concept of a method patent because of the lack of physicality of such subject matter.
Or if you want to go further back: the use of 17th century English stationers of the term “copy” as a quasi-physical object of property.
But yes, in modern American copyright law MAI and RAM copies is certainly a major highlight.