TY - GEN
T1 - Problematizing and addressing the article-as-concept assumption in Wikipedia
AU - Lin, Yilun
AU - Yu, Bowen
AU - Hall, Andrew
AU - Hecht, Brent
N1 - Funding Information:
The authors would like to thank Stephanie Hernandez, Geovanna Hinojoza, Federico Peredes, Stephanie Hecht, Patti Bao, and Darren Gergle for their valuable contributions to this project. This project was funded by NSF IIS-1552955, NSF IIS-1526988, and NSF IIS-1421655.
Publisher Copyright:
© 2017 ACM.
PY - 2017/2/25
Y1 - 2017/2/25
N2 - Wikipedia-based studies and systems frequently assume that no two articles describe the same concept. However, in this paper, we show that this article-as-concept assumption is problematic due to editors' tendency to split articles into parent articles and sub-articles when articles get too long for readers (e.g. "Portland, Oregon" and "History of Portland, Oregon" in the English Wikipedia). In this paper, we present evidence that this issue can have significant impacts on Wikipedia-based studies and systems and introduce the subarticle matching problem. The goal of the sub-article matching problem is to automatically connect sub-articles to parent articles to help Wikipedia-based studies and systems retrieve complete information about a concept. We then describe the first system to address the sub-article matching problem. We show that, using a diverse feature set and standard machine learning techniques, our system can achieve good performance on most of our ground truth datasets, significantly outperforming baseline approaches.
AB - Wikipedia-based studies and systems frequently assume that no two articles describe the same concept. However, in this paper, we show that this article-as-concept assumption is problematic due to editors' tendency to split articles into parent articles and sub-articles when articles get too long for readers (e.g. "Portland, Oregon" and "History of Portland, Oregon" in the English Wikipedia). In this paper, we present evidence that this issue can have significant impacts on Wikipedia-based studies and systems and introduce the subarticle matching problem. The goal of the sub-article matching problem is to automatically connect sub-articles to parent articles to help Wikipedia-based studies and systems retrieve complete information about a concept. We then describe the first system to address the sub-article matching problem. We show that, using a diverse feature set and standard machine learning techniques, our system can achieve good performance on most of our ground truth datasets, significantly outperforming baseline approaches.
KW - Peer production
KW - Sub-article matching problem
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=85014783224&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014783224&partnerID=8YFLogxK
U2 - 10.1145/2998181.2998274
DO - 10.1145/2998181.2998274
M3 - Conference contribution
AN - SCOPUS:85014783224
T3 - Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW
SP - 2052
EP - 2067
BT - CSCW 2017 - Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing
PB - Association for Computing Machinery
T2 - 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW 2017
Y2 - 25 February 2017 through 1 March 2017
ER -