High-Precision Extraction of Emerging Concepts from Scientific Literature

Daniel King, Doug Downey, Daniel S. Weld

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual identification can't keep up with the torrent of new publications, while the precision of existing automatic techniques is too low for many applications. We present an unsupervised concept extraction method for scientific literature that achieves much higher precision than previous work. Our approach relies on a simple but novel intuition: each scientific concept is likely to be introduced or popularized by a single paper that is disproportionately cited by subsequent papers mentioning the concept. From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15,000 extractions. To stimulate research in this area, we release our code and data.

Original languageEnglish (US)
Title of host publicationSIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages1549-1552
Number of pages4
ISBN (Electronic)9781450380164
DOIs
StatePublished - Jul 25 2020
Event43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020 - Virtual, Online, China
Duration: Jul 25 2020Jul 30 2020

Publication series

NameSIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference43rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020
CountryChina
CityVirtual, Online
Period7/25/207/30/20

Keywords

  • citation graph
  • concept extraction
  • scientific literature

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems
  • Software

Fingerprint Dive into the research topics of 'High-Precision Extraction of Emerging Concepts from Scientific Literature'. Together they form a unique fingerprint.

  • Cite this

    King, D., Downey, D., & Weld, D. S. (2020). High-Precision Extraction of Emerging Concepts from Scientific Literature. In SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1549-1552). (SIGIR 2020 - Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery, Inc. https://doi.org/10.1145/3397271.3401235