Methods for exploring and mining tables on Wikipedia

Chandra Sekhar Bhagavatula, Thanapon Noraset, Douglas C Downey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

81 Scopus citations

Abstract

Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application that enables users to interactively explore tabular knowledge extracted from Wikipedia. In experiments, we show that WikiTables substantially outperforms baselines on the novel task of automatically joining together disparate tables to uncover \interesting" relationships between table columns. We find that a \Semantic Relatedness"measure that leverages the Wikipedia link structure accounts for a majority of this improvement. Further, on the task of keyword search for tables, we show that WikiTables performs comparably to Google Fusion Tables despite using an order of magnitude fewer tables. Our work also includes the release of a number of public resources, including over 15 million tuples of extracted tabular data, manually annotated evaluation sets, and public APIs.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGKDD 2013 Workshop on Interactive Data Exploration and Analytics, IDEA 2013
PublisherAssociation for Computing Machinery
Pages18-26
Number of pages9
ISBN (Print)9781450323291
DOIs
StatePublished - 2013
EventACM SIGKDD 2013 Workshop on Interactive Data Exploration and Analytics, IDEA 2013 - Chicago, IL, United States
Duration: Aug 11 2013Aug 11 2013

Publication series

NameProceedings of the ACM SIGKDD 2013 Workshop on Interactive Data Exploration and Analytics, IDEA 2013

Other

OtherACM SIGKDD 2013 Workshop on Interactive Data Exploration and Analytics, IDEA 2013
Country/TerritoryUnited States
CityChicago, IL
Period8/11/138/11/13

Funding

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Information Systems

Fingerprint

Dive into the research topics of 'Methods for exploring and mining tables on Wikipedia'. Together they form a unique fingerprint.

Cite this