Estimating the Value of the Public Domain

Paul J. Heald, Kris Erickson and Martin Kretschmer, The Valuation of Unprotected Works: A Case Study of Public Domain Photographs on Wikipedia, 28 Harvard J.L. & Tech. (forthcoming, 2015), available at SSRN.

By now, most Jotwell readers will be familiar with the terrific empirical research that Paul Heald has been doing on the public domain. Now, Paul has teamed up with Kristopher Erickson and Martin Kretschmer, scholars at the University of Glasgow and the CREATe centre (which stands for Creativity, Regulation, Enterprise, and Technology). CREATe is a publicly funded multi-disciplinary program that provides research support to produce evidence-based assessments of IP policies—something I think we can all agree that we like lots.

Heald, Erikson, and Kretschmer (HEK) have recently posted a new paper that presents a section from CREATe’s larger empirical project on copyright and the value of the public domain. I strongly recommend the entire report, which includes two separate empirical studies, but will focus my comments on the shorter paper.

The authors begin by noting that copyright owners have become adept at offering quantitative assessments of the economic value that copyright industries produce. Although there are numerous estimates of the value of copyright law, there are, however, very few attempts to measure the economic value of the public domain. HEK’s paper begins to balance the ledger by estimating the value of a robust public domain for creative reuse.

To do so, the authors modify and extend a technique that was recently introduced by Abishek Nagaraj at MIT. The basic idea is to analyze Wikipedia pages for the use of photographs where the availability of photographs is affected by the public domain. HEK study the use of photographs of successful literary authors on their Wikipedia pages.

The sample includes 362 authors who had at least one New York Times bestseller from 1895 to 1969. The authors were born between 1829 and 1942 and, thus, span the 1923 public domain/copyright divide. Authors who were born and died before 1923 can only be represented by public domain images; authors born after 1923 can only be represented by copyright-eligible images; and authors whose lives span the divide can be represented by both. HEK hypothesize that, despite the many fewer images that exist of earlier authors, those authors’ pages will be more likely to include an image than later-born authors’ pages. This is because the public domain images can be freely used, but the copyrighted images likely have to be licensed.

Their results support this hypothesis. While authors born after 1920 have about an even chance of being represented by a photo on their Wikipedia pages, authors born between 1850 and 1880 have about a 90% chance of being represented by a photograph. The difference, according to the authors, is the much larger set of freely available images for the older authors.

This finding alone would provide significant evidence of the value of a robust public domain. But HEK want to go further to estimate the extent to which the addition of photographs on Wikipedia pages represents social welfare. First, they consider what it would have cost Wikipedia to license the public domain images. The same or similar images could have been licensed from Corbis or Getty for about $120 each, so HEK estimate that the Wikipedia page builders saved $77,400 over a five-year period. Extrapolating to Wikipedia as a whole, this would amount to a savings of about a quarter of a billion dollars per year.

HEK also attempt to estimate the social value added by the public domain images by comparing the number of page views for pages with and without photographs. The inclusion of photographs increases traffic to webpages (although the precise mechanism isn’t spelled out clearly), and increased traffic means increased advertising revenue (at least to pages that accept ads). The authors measure changes in page views from 2009 to 2014 for those authors who had an image added to their pages after 2009 and for those that did not. Presumably, this should help isolate the draw of the image. HEK estimate that the addition of an image increased page views by 19% during their sample period. Each additional page view is worth about $0.005 in additional ad revenue, so again HEK attempt to assess the hypothetical revenue that Wikipedia could be making based on its use of public domain images. They calculate that the increased traffic to Wikipedia from public domain images is worth about $38 million per year.

HEK conclude by offering policy recommendations regarding the harm of copyright term extension and the value of orphan works legislation. While these suggestions are important, the greatest value from this project, and the others that CREATe is producing, is the richer picture of the copyright landscape that they provide. When the next round of copyright legislation begins, both sides will be armed with quantitative figures about costs and benefits.

Finally, in the spirit of further encouragement and next steps, I would like to see more sophisticated analysis of the data, including regression analysis of the initial data set. HEK’s claims about the data would be bolstered with a fuller impression of the effects of each of their variables. Additionally, to deal with endogeneity problems associated with the existence of photographs and other variables, the authors could consider approaching the problem experimentally by randomly assigning different Wikipedia pages to receive a photograph. This might provide greater explanatory power about the relationship between images and page views.

Cite as: Christopher J. Buccafusco, Estimating the Value of the Public Domain, JOTWELL (June 17, 2015) (reviewing Paul J. Heald, Kris Erickson and Martin Kretschmer, The Valuation of Unprotected Works: A Case Study of Public Domain Photographs on Wikipedia, 28 Harvard J.L. & Tech. (forthcoming, 2015), available at SSRN),

1 comment
  1. 1

    I found that the sloppiest math included in producing the “$246 million a year” was assuming that an article about a New York Times bestseller list author’s biography can represent the average potential Wikipedia-driven income for all Wikipedia pages.

    In other words, I understand how an illustrated biography of Harper Lee might encourage me to go buy To Kill a Mockingbird on Amazon a bit more than an non-illustrated biography of Harper Lee would encourage me. However, I don’t see how an illustrated article about an asteroid would encourage me to go buy something, any more than a non-illustrated asteroid article would.

    Apparently, the fields of social science and journalism have both been inherited by complete nitwits.