Mining keyphrase hierarchies for scientific document search

Project: Research project

Project Details

Description

Our goal is to produce topical organizations of scholarly content. For a corpus of academic papers we will produce a set of metadata for each paper in the form of keyphrases (e.g., “maximum margin learning”). We will organize the documents into hierarchies of topics, with each node in the hierarchy defined by one or more of the discovered keyphrases. Additionally, using technology we already have, we’ll augment these keyphrases with its corresponding Wikipedia link, if any (e.g., http://en.wikipedia.org/wiki/Margin_classifier). We will provide the keyphrases for AI2’s document corpora, and release the code that detects the keyphrases in a give corpus. All code will be open source. We’ll experimentally evaluate our approach for accuracy and utility. Depending on initial results we will explore modeling keyphrases as a function of not just document content, but also document attributes such as: publication date, author, institution, venue.
StatusFinished
Effective start/end date6/1/142/29/16

Funding

  • Allen Institute for Artificial Intelligence (Agmt starting 06/01/14)

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.