The Journal of Things We Like (Lots)
Select Page
Christopher Buccafusco & Rebecca Tushnet, Of Bass Notes and Base Rates: Avoiding Mistaken Inferences About Copying, __ Hous. L. Rev. __ (forthcoming, 2023).

Some years ago I attended a presentation by a musicologist who specialized in giving testimony in copyright litigation. Here’s how he tried to grab the audience: First, he would play a clip from a well-known track by a popular musician or band. Then he would play a selection from an earlier, lesser-known track by an obscure musician or band that sounded similar to the first clip, all while giving the audience a wide-eyed stare. The impression this created was intentional and unmistakable. Clearly the well-known artist had copied from the lesser-known one!

The audience, mostly laypeople, certainly bought it, based on the gasps that accompanied the presenter’s schtick. I did not, and left frustrated that the musicologist-turned-expert-witness had tricked the audience into thinking that he had exposed several instances of egregious copyright infringement. I knew something was wrong but had difficulty putting my finger on just what was the problem with the presenter’s move.

Thanks to Christopher Buccafusco and Rebecca Tushnet’s sparkling essay, Of Bass Notes and Base Rates: Avoiding Mistaken Inferences About Copying, I finally have a clear picture of the error that afflicted that presentation and so much copyright litigation. As the authors explain, the application of copyright’s substantial similarity problem suffers from base rate neglect, which causes courts and litigants to significantly overstate the likelihood that a defendant copied from a plaintiff.

Let’s unpack this a bit. Plaintiffs in copyright infringement litigation must make a threshold showing that the defendant copied their work. This usually entails proof that the defendant had access to the plaintiff’s work and that the works share similarities that are probative of copying. Answering the latter question in the plaintiff’s favor depends on their amassing enough evidence to support an inference of copying. At this stage, plaintiffs often retain expert witnesses—usually musicologists,1 like the speaker I mentioned above—who testify that the degree of similarity between the two works is so great and distinctive that it could not be explained by, for example, independent creation or copying from a public domain source.

Here, the authors argue, is where base rate error creeps in. When a musicologist testifies that the quantum of similarity between two works enables an inference of copying, that ignores the crucial issue of how likely such similarity would be absent copying (the base rate). After all, there are only so many appealing chord progressions out there, and the plaintiff presumably created the work herself, so unless we know how likely it is that the similarities at issue would occur in the normal course, it is impossible to say whether the similarities in a given dispute are truly probative of copying.

The authors state that their argument is “modest.” I dissent. Not only did their central insight help me understand what was wrong with the musicologist’s presentation I resented so much (thanks!) but it has the potential to change the way we think and conduct copyright litigation. The past couple of decades have seen an uptick in infringement lawsuits by lesser-known artists alleging that hit tracks by popular artists (e.g., Katy Perry, Led Zeppelin, Marvin Gaye, Ed Sheeran, Taylor Swift, Lana Del Rey, Dua Lipa, and many many more) infringe their earlier, lesser-known works. A central feature of these plaintiffs’ cases is the testimony of musicologists that the degree of similarity between the works in suit is so great that it is explicable only by the defendant’s copying.

But Buccafusco and Tushnet have shown the expert testimony in these cases shares a common, fatal flaw: base rate neglect. Not one of the experts in any of these cases actually knew how likely it was that the given similarity would occur absent copying. This means that experts’ central evidentiary contribution to the litigation (quantum of similarity implies copying) is not a reliable, informed opinion but just highly articulate hand-waving. Many critics decry the purported irrelevance of legal scholarship to law, but this essay represents a conspicuous counterpoint.

The authors limit their discussion of the implications of their insight to the admissibility of expert testimony. Yet it may have other important payoffs. For example, some courts have (controversially) held that where the degree of similarity between two works is extremely high, that enables an inference of copying, regardless of evidence of access. Judges applying this “striking similarity” doctrine often commit base rate errors, assuming that a high degree of similarity itself warrants a conclusion that the defendant copied the plaintiff’s work absent any sense of how likely the similarity would otherwise be (i.e., the base rate).

Another tantalizing question provoked by this essay is whether and how the base rate problem varies within and across genres. Music is especially ripe for this fallacy because the number of notes in the scale is low and the number of appealing combinations of them is lower still, all of which suggests a higher base rate of similarity. But some genres of music are notoriously self-similar, such as ska or reggae, which are defined by a syncopated guitar rhythm. Musical works in these categories are especially likely to have a high base rate of similarity. The same may be true in some literary (formulaic romance or fantasy novels) and artistic (traditional portraiture) contexts that are defined by certain core features. By contrast, the alphabet yields many more available combinations of appealing words and phrases than the musical scale does appealing progressions of notes, so the base rate of similarity in most literary works is likely lower. None of these assertions about relative base rates can be reduced to specific numbers (the authors point out that we cannot actually know the base rate of similarity in any genre) but they each illustrate how the idea of base rates serves as a useful heuristic when thinking about copyright infringement generally.

The authors conclude that because we cannot deduce the base rate of similarity in any instance of claimed infringement, at present the wisest move is to simply bar experts from rendering conclusions about whether similarity supports an inference of copying. And while this does reflect the current state of play, it raises an intriguing possibility that may be realized sooner than one might think. If a large language model can learn from many billions of data points to produce astonishingly plausible simulacra of human responses to prompts, why couldn’t a similar artificial intelligence learn from the millions of available tracks to find the likelihood that certain similarities between musical works occur naturally? Big data might hold the solution to copyright’s base rate neglect problem.

By importing a known but underappreciated idea from quantitative analysis, Christopher Buccafusco and Rebecca Tushnet have generated a simple but significant insight that has the potential to change the way lawyers, judges, and academics think about both the doctrine of copyright litigation and how it unfolds in litigation. And they do it all in under 9000 words. This article is based and I rate it very highly.

Download PDF
  1. Judges’ reliance on musicologists to compare musical works in suit to preexisting compositions has grown so prevalent that some authors have compared it to the patent law practice of assessing inventions in light of prior art. See Joseph Fishman & Kristelia Garcia, Authoring Prior Art, 75 Vand. L. Rev. 1159 (2022).
Cite as: David Fagundes, All About That Base Rate, JOTWELL (November 15, 2023) (reviewing Christopher Buccafusco & Rebecca Tushnet, Of Bass Notes and Base Rates: Avoiding Mistaken Inferences About Copying, __ Hous. L. Rev. __ (forthcoming, 2023)), https://ip.jotwell.com/all-about-that-base-rate/.