Project Details
Description
Overview:
Visualizations can greatly enhance textual content, supporting better understanding of complex facts, creating context, and aiding decision makers. In this work, we propose to design and implement a mechanism for automatically generating contextually-relevant visualizations for news articles similar to the manually-created visualizations (e.g., maps, line graphs, bar charts) that increasingly appear in major news outlets. By utilizing novel natural language and text mining approaches, the system we envision can define a set of "queries" that encode the topic of an article, the data it references (e.g., "unemployment in CA in March," "global average temperatures in 2012"), and the comparisons that are made in the article’s text (e.g., differences between states or over time). Given this query, we are able to search for relevant datasets, construct views (data subsets), and automatically generate contextually-relevant visualizations for the input article. While the visualizations act to supplement the articles
by providing graphical support for presented information, we utilize corpora of documents (e.g., a large corpus of recent news articles) to enhance the visualizations. By identifying "interesting" features in a visualization that are associated with content in the corpora, the system can generate a set of annotations that "explain" these features and provide navigation support through the corpus, thus allowing the end user to "pivot" between visualizations and articles. Automatically designing article-accompanying visualizations requires satisfying a number of concerns including relevance (visualized datasets must be related to the article), expressiveness (visualizations should utilize the appropriate graphical form for the data and comparisons implied in the article), effectiveness (visualizations should use graphical marks that are the best perceptual encodings for the data given the document text, including the implicit and explicit values and comparisons referenced in the text), and interestingness (visualizations should depict interesting patterns and offer explanations through annotations). The specific strategies for extracting the queries from article text, generating visualizations for the queries, identifying and evaluating the features that operationalize identified design
concerns, and exploring the tradeoffs inherent in these concerns represent the technical contributions of this work.
Intellectual Merit :
Our proposed project bridges natural language and text processing, visualization, and cartography to create new techniques and tools while simultaneously making advances in each field. We are developing methods to identify the datasets most relevant to a given document of unstructured text and are constructing new mechanisms to identify and extract data from comparative natural language sentences (and the types of comparisons being made). Our research on visualization explores mechanisms to produce and rank visualizations of many forms (e.g., maps, timelines, scatterplots, bar charts) to accompany text documents according to design concerns such as relevance, effectiveness, expressiveness, and interestingness. Our previous work identified
maps as the most prevalent form of article-accompanying visualization, and our proposed work will be the first to explore the fundamental cartographic relationship between the (geo)statistical properties of a dataset and its potential to create an interesting map.
Broader Impacts :
This work has a number of important broader impacts including increasing reader un
Status | Finished |
---|---|
Effective start/end date | 8/1/16 → 8/31/18 |
Funding
- National Science Foundation (IIS-1702440)
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.