TY - GEN
T1 - Methods for exploring and mining tables on Wikipedia
AU - Bhagavatula, Chandra Sekhar
AU - Noraset, Thanapon
AU - Downey, Douglas C
PY - 2013
Y1 - 2013
N2 - Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application that enables users to interactively explore tabular knowledge extracted from Wikipedia. In experiments, we show that WikiTables substantially outperforms baselines on the novel task of automatically joining together disparate tables to uncover \interesting" relationships between table columns. We find that a \Semantic Relatedness"measure that leverages the Wikipedia link structure accounts for a majority of this improvement. Further, on the task of keyword search for tables, we show that WikiTables performs comparably to Google Fusion Tables despite using an order of magnitude fewer tables. Our work also includes the release of a number of public resources, including over 15 million tuples of extracted tabular data, manually annotated evaluation sets, and public APIs.
AB - Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present WikiTables, a Web application that enables users to interactively explore tabular knowledge extracted from Wikipedia. In experiments, we show that WikiTables substantially outperforms baselines on the novel task of automatically joining together disparate tables to uncover \interesting" relationships between table columns. We find that a \Semantic Relatedness"measure that leverages the Wikipedia link structure accounts for a majority of this improvement. Further, on the task of keyword search for tables, we show that WikiTables performs comparably to Google Fusion Tables despite using an order of magnitude fewer tables. Our work also includes the release of a number of public resources, including over 15 million tuples of extracted tabular data, manually annotated evaluation sets, and public APIs.
UR - http://www.scopus.com/inward/record.url?scp=84887450627&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84887450627&partnerID=8YFLogxK
U2 - 10.1145/2501511.2501516
DO - 10.1145/2501511.2501516
M3 - Conference contribution
AN - SCOPUS:84887450627
SN - 9781450323291
T3 - Proceedings of the ACM SIGKDD 2013 Workshop on Interactive Data Exploration and Analytics, IDEA 2013
SP - 18
EP - 26
BT - Proceedings of the ACM SIGKDD 2013 Workshop on Interactive Data Exploration and Analytics, IDEA 2013
PB - Association for Computing Machinery
T2 - ACM SIGKDD 2013 Workshop on Interactive Data Exploration and Analytics, IDEA 2013
Y2 - 11 August 2013 through 11 August 2013
ER -