Patent Similarity Data and Innovation Metrics

Ryan Whalen*, Alina Lungeanu, Leslie DeChurch, Noshir Contractor

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

18 Scopus citations


We introduce and describe the Patent Similarity Dataset, comprising vector space model-based similarity scores for U.S. utility patents. The dataset provides approximately 640 million pre-calculated similarity scores, as well as the code and computed vectors required to calculate further pairwise similarities. In addition to the raw data, we introduce measures that leverage patent similarity to provide insight into innovation and intellectual property law issues of interest to both scholars and policymakers. Code is provided in accompanying scripts to assist researchers in obtaining the dataset, joining it with other available patent data, and using it in their research.

Original languageEnglish (US)
Pages (from-to)615-639
Number of pages25
JournalJournal of Empirical Legal Studies
Issue number3
StatePublished - Sep 1 2020

ASJC Scopus subject areas

  • Education
  • Law


Dive into the research topics of 'Patent Similarity Data and Innovation Metrics'. Together they form a unique fingerprint.

Cite this