Revisiting reuse for approximate query processing

Alex Galakatos, Andrew Crotty, Emanuel Zgraggen, Carsten Binnig, Tim Kraska

Research output: Contribution to journalConference articlepeer-review

58 Scopus citations

Abstract

Visual data exploration tools allow users to quickly gather insights from new datasets. As dataset sizes continue to in- crease, though, new techniques will be necessary to maintain the interactivity guarantees that these tools require. Ap- proximate query processing (AQP) attempts to tackle this problem and allows systems to return query results at "hu-man speed." However, existing AQP techniques start to break down when confronted with ad hoc queries that tar- get the tails of the distribution. We therefore present an AQP formulation that can pro- vide low-error approximate results at interactive speeds, even for queries over rare subpopulations. In particular, our for- mulation treats query results as random variables in order to leverage the ample opportunities for result reuse inher- ent in interactive data exploration. As part of our approach, we apply a variety of optimization techniques that are based on probability theory, including new query rewrite rules and index structures. We implemented these techniques in a pro- totype system and show that they can achieve interactivity where alternative approaches cannot.

Original languageEnglish (US)
Pages (from-to)1142-1153
Number of pages12
JournalProceedings of the VLDB Endowment
Volume10
Issue number10
DOIs
StatePublished - Jun 1 2017
Event43rd International Conference on Very Large Data Bases, VLDB 2017 - Munich, Germany
Duration: Aug 28 2017Sep 1 2017

Funding

This research is funded in part by the NSF CAREER Award IIS-1453171, NSF Award IIS-1514491, Air Force YIP AWARD FA9550-15-1-0144, and the Intel Science and Tech- nology Center for Big Data, as well as gifts from Google, VMware, Mellanox, and Oracle.

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Revisiting reuse for approximate query processing'. Together they form a unique fingerprint.

Cite this