Problematizing and addressing the article-as-concept assumption in Wikipedia

Yilun Lin, Bowen Yu, Andrew Hall, Brent Hecht

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Wikipedia-based studies and systems frequently assume that no two articles describe the same concept. However, in this paper, we show that this article-as-concept assumption is problematic due to editors' tendency to split articles into parent articles and sub-articles when articles get too long for readers (e.g. "Portland, Oregon" and "History of Portland, Oregon" in the English Wikipedia). In this paper, we present evidence that this issue can have significant impacts on Wikipedia-based studies and systems and introduce the subarticle matching problem. The goal of the sub-article matching problem is to automatically connect sub-articles to parent articles to help Wikipedia-based studies and systems retrieve complete information about a concept. We then describe the first system to address the sub-article matching problem. We show that, using a diverse feature set and standard machine learning techniques, our system can achieve good performance on most of our ground truth datasets, significantly outperforming baseline approaches.

Original languageEnglish (US)
Title of host publicationCSCW 2017 - Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing
PublisherAssociation for Computing Machinery
Pages2052-2067
Number of pages16
ISBN (Electronic)9781450343350
DOIs
StatePublished - Feb 25 2017
Event2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW 2017 - Portland, United States
Duration: Feb 25 2017Mar 1 2017

Publication series

NameProceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW

Other

Other2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW 2017
CountryUnited States
CityPortland
Period2/25/173/1/17

Keywords

  • Peer production
  • Sub-article matching problem
  • Wikipedia

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Human-Computer Interaction

Fingerprint Dive into the research topics of 'Problematizing and addressing the article-as-concept assumption in Wikipedia'. Together they form a unique fingerprint.

  • Cite this

    Lin, Y., Yu, B., Hall, A., & Hecht, B. (2017). Problematizing and addressing the article-as-concept assumption in Wikipedia. In CSCW 2017 - Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 2052-2067). (Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW). Association for Computing Machinery. https://doi.org/10.1145/2998181.2998274