VizByWiki: Mining data visualizations from the web to enrich news articles

Allen Yilun Lin, Joshua Ford, Eytan Adar, Brent Hecht

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

Data visualizations in news articles (e.g., maps, line graphs, bar charts) greatly enrich the content of news articles and result in well-established improvements to reader comprehension. However, existing systems that generate news data visualiza-tions either require substantial manual effort or are limited to very specific types of data visualizations, thereby greatly re-stricting the number of news articles that can be enhanced. To address this issue, we define a new problem: given a news ar-ticle, retrieve relevant visualizations that already exist on the web. We show that this problem is tractable through a new system, VizByWiki, that mines contextually relevant data visualizations from Wikimedia Commons, the central file reposi-tory for Wikipedia. Using a novel ground truth dataset, we show that VizByWiki can successfully augment as many as 48% of popular online news articles with news visualizations. We also demonstrate that VizByWiki can automatically rank visualizations according to their usefulness with reasonable accuracy (nDCG@5 of 0.82). To facilitate further advances on our "news visualization retrieval problem", we release our ground truth dataset and make our system and its source code publicly available.

Original languageEnglish (US)
Title of host publicationThe Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018
PublisherAssociation for Computing Machinery, Inc
Pages873-882
Number of pages10
ISBN (Electronic)9781450356398
DOIs
StatePublished - Apr 10 2018
Event27th International World Wide Web, WWW 2018 - Lyon, France
Duration: Apr 23 2018Apr 27 2018

Publication series

NameThe Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018

Conference

Conference27th International World Wide Web, WWW 2018
Country/TerritoryFrance
CityLyon
Period4/23/184/27/18

Keywords

  • Data visualizations
  • News articles
  • Peer production
  • User-generated content
  • Wikimedia commons
  • Wikipedia

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'VizByWiki: Mining data visualizations from the web to enrich news articles'. Together they form a unique fingerprint.

Cite this