Abstract
Visual data exploration tools allow users to quickly gather insights from new datasets. As dataset sizes continue to in- crease, though, new techniques will be necessary to maintain the interactivity guarantees that these tools require. Ap- proximate query processing (AQP) attempts to tackle this problem and allows systems to return query results at "hu-man speed." However, existing AQP techniques start to break down when confronted with ad hoc queries that tar- get the tails of the distribution. We therefore present an AQP formulation that can pro- vide low-error approximate results at interactive speeds, even for queries over rare subpopulations. In particular, our for- mulation treats query results as random variables in order to leverage the ample opportunities for result reuse inher- ent in interactive data exploration. As part of our approach, we apply a variety of optimization techniques that are based on probability theory, including new query rewrite rules and index structures. We implemented these techniques in a pro- totype system and show that they can achieve interactivity where alternative approaches cannot.
Original language | English (US) |
---|---|
Pages (from-to) | 1142-1153 |
Number of pages | 12 |
Journal | Proceedings of the VLDB Endowment |
Volume | 10 |
Issue number | 10 |
DOIs | |
State | Published - Jun 1 2017 |
Event | 43rd International Conference on Very Large Data Bases, VLDB 2017 - Munich, Germany Duration: Aug 28 2017 → Sep 1 2017 |
Funding
This research is funded in part by the NSF CAREER Award IIS-1453171, NSF Award IIS-1514491, Air Force YIP AWARD FA9550-15-1-0144, and the Intel Science and Tech- nology Center for Big Data, as well as gifts from Google, VMware, Mellanox, and Oracle.
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- General Computer Science