Skew-aware join optimization for array databases

Jennie Duggan, Olga Papaemmanouil, Leilani Battle, Michael Stonebraker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

Science applications are accumulating an ever-increasing amount of multidimensional data. Although some of it can be processed in a relational database, much of it is better suited to array-based engines. As such, it is important to optimize the query processing of these systems. This paper focuses on efficient query processing of join operations within an array database. These engines invariably "chunk" their data into multidimensional tiles that they use to efficiently process spatial queries. As such, traditional relational algorithms need to be substantially modified to take advantage of array tiles. Moreover, most n-dimensional science data is unevenly distributed in array space because its underlying observations rarely follow a uniform pattern. It is crucial that the optimization of array joins be skew-aware. In addition, owing to the scale of science applications, their query processing usually spans multiple nodes. This further complicates the planning of array joins. In this paper, we introduce a join optimization framework that is skew-aware for distributed joins. This optimization consists of two phases. In the first, a logical planner selects the query's algorithm (e.g., merge join), the granularity of the its tiles, and the reorganization operations needed to align the data. The second phase implements this logical plan by assigning tiles to cluster nodes using an analytical cost model. Our experimental results, on both synthetic and real-world data, demonstrate that this optimization framework speeds up array joins by up to 2.5X in comparison to the baseline.

Original languageEnglish (US)
Title of host publicationSIGMOD 2015 - Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages123-135
Number of pages13
ISBN (Electronic)9781450327589
DOIs
StatePublished - May 27 2015
EventACM SIGMOD International Conference on Management of Data, SIGMOD 2015 - Melbourne, Australia
Duration: May 31 2015Jun 4 2015

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
Volume2015-May
ISSN (Print)0730-8078

Other

OtherACM SIGMOD International Conference on Management of Data, SIGMOD 2015
Country/TerritoryAustralia
CityMelbourne
Period5/31/156/4/15

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Skew-aware join optimization for array databases'. Together they form a unique fingerprint.

Cite this