TY - GEN
T1 - The tower of babel.jpg
T2 - 12th International AAAI Conference on Web and Social Media, ICWSM 2018
AU - He, Shiqing
AU - Lin, Allen Yilun
AU - Adar, Eytan
AU - Hecht, Brent
N1 - Funding Information:
This work was funded in part by the U.S. NSF (IIS-1702440, IIS-1707319, CAREER IIS-1707296, and IIS-1421438). We thank Ceren Budak, Clifford Lampe, and Hariharan Subra-monyam for their feedback.
Publisher Copyright:
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2018
Y1 - 2018
N2 - Across all Wikipedia language editions, millions of images augment text in critical ways. This visual encyclopedic knowledge is an important form of wikiwork for editors, a critical part of reader experience, an emerging resource for machine learning, and a lens into cultural differences. However, Wikipedia research-and cross-language edition Wikipedia research in particular-has thus far been limited to text. In this paper, we assess the diversity of visual encyclopedic knowledge across 25 language editions and compare our findings to those reported for textual content. Unlike text, translation in images is largely unnecessary. Additionally, the Wikimedia Foundation, through the Wikipedia Commons, has taken steps to simplify cross-language image sharing. While we may expect that these factors would reduce image diversity, we find that cross-language image diversity rivals, and often exceeds, that found in text. We find that diversity varies between language pairs and content types, but that many images are unique to different language editions. Our findings have implications for readers (in what imagery they see), for editors (in deciding what images to use), for researchers (who study cultural variations), and for machine learning developers (who use Wikipedia for training models).
AB - Across all Wikipedia language editions, millions of images augment text in critical ways. This visual encyclopedic knowledge is an important form of wikiwork for editors, a critical part of reader experience, an emerging resource for machine learning, and a lens into cultural differences. However, Wikipedia research-and cross-language edition Wikipedia research in particular-has thus far been limited to text. In this paper, we assess the diversity of visual encyclopedic knowledge across 25 language editions and compare our findings to those reported for textual content. Unlike text, translation in images is largely unnecessary. Additionally, the Wikimedia Foundation, through the Wikipedia Commons, has taken steps to simplify cross-language image sharing. While we may expect that these factors would reduce image diversity, we find that cross-language image diversity rivals, and often exceeds, that found in text. We find that diversity varies between language pairs and content types, but that many images are unique to different language editions. Our findings have implications for readers (in what imagery they see), for editors (in deciding what images to use), for researchers (who study cultural variations), and for machine learning developers (who use Wikipedia for training models).
UR - http://www.scopus.com/inward/record.url?scp=85050586032&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050586032&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85050586032
T3 - 12th International AAAI Conference on Web and Social Media, ICWSM 2018
SP - 102
EP - 111
BT - 12th International AAAI Conference on Web and Social Media, ICWSM 2018
PB - AAAI Press
Y2 - 25 June 2018 through 28 June 2018
ER -