Of Gene Sequences and IP’s Ex Post Incentives: An Empirical Measurement of the Effect of “Celera’s IP”

Heidi L. Williams, Intellectual Property Rights and Innovation: Evidence from the Human Genome (2010).

Empirical studies of IP that measure the effect of IP on innovation are difficult to pull off. The cleanest way to measure the effect of IP on innovation would be to run a controlled experiment in a laboratory setting: take two similarly situated groups of innovators, subject one group to a regime of exclusive rights and the other to a public-domain regime, and then sit back and watch the differences that evolve in the two groups. Unfortunately for economists, innovators cannot be treated like laboratory rats, so actively creating the control group that is required to measure the effect of IP on innovation ranges from the difficult, indirect, and expensive to the impossible. We usually have to make educated guesses about counterfactual scenarios: we just do not know for sure what would have happened if a real-world IP regime had not existed or had existed in a different form.

In her working paper of July, 2010 titled Intellectual Property Rights and Innovation: Evidence from the Human Genome (available as NBER working paper no. 16213), Professor Heidi L. Williams, an economist at MIT, overcomes the inability of scientists to create an experimental control group by identifying a rare natural experiment—a situation in which the real world provides two similarly situated groups, one of which is subject to an IP regime and one of which is not. In Williams’ words, “[t]he contribution of this paper is to identify an empirical context in which there is variation in IP across a relatively large group of ex ante similar technologies, and to trace out the impacts of IP . . . .” (P. 1.)

Williams’ subject is the sequencing of the human genome. In the late 1990s and early 2000s, the human genome was sequenced simultaneously by two different entities: the public Human Genome Project (HGP) and the private firm Celera. Each entity had its own policy concerning IP. The HGP mandated that all of its sequence data be deposited with little delay into a publicly accessible database, and it did not impose any restrictions on the use of the data. In contrast, Celera sought to monetize its database of sequence information with a form of contract-based IP that Williams refers to as “Celera’s IP.”¹ Williams used differences in the progress of the two entities’ efforts to identify two different groups of gene-sequence data. First, there were the gene sequences that had already been produced by the HGP as of 2001 and thus were available in the public domain as of that date, even if they were also available in the Celera database. Second, there were the “Celera genes”—those genes whose sequences were only available in Celera’s private database as of 2001 and thus could only be accessed by those who followed the conditions established by Celera’s IP. By 2003, all Celera-gene sequences had been disclosed by the HGP and were in the public domain, so Celera’s IP was temporally limited to roughly a two-year span.

Williams measures the effect of Celera’s IP on the subsequent innovation on the Celera genes, using the subsequent innovation on the genes whose sequences were available in the HGP database by 2001 as a stand-in for a control group. Looking at both the Celera genes and the remaining genes, she observes the appearance of scientific publications addressing gene function and the development of gene-based diagnostics tests for health-care consumers through 2009. What Williams finds is that Celera’s IP had a significant dampening effect on subsequent innovation on the Celera genes on the order of thirty percent, both in terms of scientific publications and diagnostic tests.

Williams is careful to note that her data should not be interpreted as providing a direct assessment of the overall welfare effects of Celera’s privately funded efforts to sequence the human genome. She is unable to capture data on the social value of the more rapid availability of the sequence data that is attributable to Celera’s efforts or, more generally, to the value of IP’s ex ante incentives to innovate. What she claims to demonstrate is only that Celera’s IP did not serve the ex post function of providing incentives to further investigate the Celera genes or develop their commercial potential. In fact, Celera’s IP seems to have done just the opposite: it seems to have retarded post-discovery innovation on the Celera genes.

Williams’ analysis is not as simple as this short review suggests. At times engaging in statistical analysis that is difficult for a non-economist (like me) to follow, let alone to evaluate, Williams discusses her research design at length, including her extensive efforts to rule out the possibility of selection bias—a non-random initial sorting of genes into the Celera and non-Celera groups based on their promise as a target of academic inquiry or a commercially valuable therapeutic. Nonetheless, the bulk of the paper is accessible to non-economists, and its lessons for anyone interested in understanding the real-world impact of intellectual property make it a must-read piece for intellectual property scholars.

From the perspective of a scholar of intellectual property, however, there is one important issue on which Williams does not dwell: What precisely is the nature of Celera’s IP? Celera’s IP is an unusually weak form of IP—a fact that makes Williams’ findings even more compelling. As defined by Williams, Celera’s IP had nothing to do with patents. In fact, Williams could not match data about gene patents to her data, leaving her “unable to examine patenting as either an outcome or as a potential mechanism for the observed Celera IP effects.” (P. 11, n. 32.) Nor was Celera’s IP a form of trade secrecy. As of 2001, Celera’s publication of its draft Genome in Science meant that any member of the public could access and view Celera’s database. Rather, Celera’s IP existed principally in two contractual terms to which anyone who accessed Celera’s database had to agree. First, nonprofit researchers could use the data free of charge for research purposes, but anyone interested in using the Celera data for commercial purposes had to negotiate a commercial-user license with Celera. Data on the commercial-user licenses is not public, but Celera is rumored to have required a significant subscription fee that ran in the millions of dollars per year for pharmaceutical companies. Second, to prevent access to the data by commercial users who did not pay the subscription fee, the license generally prohibited redistribution of the data, but Celera would deposit sequence data into a publically available database if such deposition was required for publication of research results. If this relatively weak form of IP had the persistent negative effects on both subsequent scientific research and commercial product development that Williams documents, one can only wonder what effects stronger and more-common forms of IP might have had. Williams’ paper examines a natural experiment to provide valuable insight into the effect of IP on innovation, but, as is perhaps inevitable in the world of empirical research, it only wets our appetite, leaving us to our educated guesses about what might happen in yet other counterfactual scenarios.

Williams uses the phrase “Celera’s IP” in an ambiguous fashion to refer both to the set of gene sequences that were only available in Celera’s database (the res) and the set of legal rights that Celera exercised with respect to the gene sequences in the database (the rights). I use the phrase only in the latter sense, and I refer to the former concept as the “Celera genes.”

Cite as: Kevin E. Collins, Of Gene Sequences and IP’s Ex Post Incentives: An Empirical Measurement of the Effect of “Celera’s IP”, JOTWELL (September 6, 2011) (reviewing Heidi L. Williams, Intellectual Property Rights and Innovation: Evidence from the Human Genome (2010)), https://ip.jotwell.com/of-gene-sequences-and-ips-ex-post-incentives-an-empirical-measurement-of-the-effect-of-celeras-ip/.