The BigDAWG polystore system and architecture

Vijay Gadepally*, Peinan Chen, Jennie M Rogers, Aaron Elmore, Brandon Haynes, Jeremy Kepner, Samuel Madden, Tim Mattson, Michael Stonebraker

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Citations (Scopus)

Abstract

Organizations are often faced with the challenge of providing data management solutions for large, heterogenous datasets that may have different underlying data and programming models. For example, a medical dataset may have unstructured text, relational data, time series waveforms and imagery. Trying to fit such datasets in a single data management system can have adverse performance and efficiency effects. As a part of the Intel Science and Technology Center on Big Data, we are developing a polystore system designed for such problems. BigDAWG (short for the Big Data Analytics Working Group) is a polystore system designed to work on complex problems that naturally span across different processing or storage engines. BigDAWG provides an architecture that supports diverse database systems working with different data models, support for the competing notions of location transparency and semantic completeness via islands and a middleware that provides a uniform multi-island interface. Initial results from a prototype of the BigDAWG system applied to a medical dataset validate polystore concepts. In this article, we will describe polystore databases, the current BigDAWG architecture and its application on the MIMIC II medical dataset, initial performance results and our future development plans.

Original languageEnglish (US)
Title of host publication2016 IEEE High Performance Extreme Computing Conference, HPEC 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509035250
DOIs
StatePublished - Nov 28 2016
Event2016 IEEE High Performance Extreme Computing Conference, HPEC 2016 - Waltham, United States
Duration: Sep 13 2016Sep 15 2016

Publication series

Name2016 IEEE High Performance Extreme Computing Conference, HPEC 2016

Other

Other2016 IEEE High Performance Extreme Computing Conference, HPEC 2016
CountryUnited States
CityWaltham
Period9/13/169/15/16

Fingerprint

Data Management
Data Model
Information management
Database Systems
Transparency
Time Series Data
Middleware
Large Data Sets
Waveform
Interfaces (computer)
Programming Model
Data structures
Time series
Completeness
Engine
Semantics
Architecture
Big data
Prototype
Engines

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Hardware and Architecture
  • Computational Mathematics

Cite this

Gadepally, V., Chen, P., Rogers, J. M., Elmore, A., Haynes, B., Kepner, J., ... Stonebraker, M. (2016). The BigDAWG polystore system and architecture. In 2016 IEEE High Performance Extreme Computing Conference, HPEC 2016 [7761636] (2016 IEEE High Performance Extreme Computing Conference, HPEC 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/HPEC.2016.7761636
Gadepally, Vijay ; Chen, Peinan ; Rogers, Jennie M ; Elmore, Aaron ; Haynes, Brandon ; Kepner, Jeremy ; Madden, Samuel ; Mattson, Tim ; Stonebraker, Michael. / The BigDAWG polystore system and architecture. 2016 IEEE High Performance Extreme Computing Conference, HPEC 2016. Institute of Electrical and Electronics Engineers Inc., 2016. (2016 IEEE High Performance Extreme Computing Conference, HPEC 2016).
@inproceedings{a55aeb64698d4e718f5107bce119eec2,
title = "The BigDAWG polystore system and architecture",
abstract = "Organizations are often faced with the challenge of providing data management solutions for large, heterogenous datasets that may have different underlying data and programming models. For example, a medical dataset may have unstructured text, relational data, time series waveforms and imagery. Trying to fit such datasets in a single data management system can have adverse performance and efficiency effects. As a part of the Intel Science and Technology Center on Big Data, we are developing a polystore system designed for such problems. BigDAWG (short for the Big Data Analytics Working Group) is a polystore system designed to work on complex problems that naturally span across different processing or storage engines. BigDAWG provides an architecture that supports diverse database systems working with different data models, support for the competing notions of location transparency and semantic completeness via islands and a middleware that provides a uniform multi-island interface. Initial results from a prototype of the BigDAWG system applied to a medical dataset validate polystore concepts. In this article, we will describe polystore databases, the current BigDAWG architecture and its application on the MIMIC II medical dataset, initial performance results and our future development plans.",
author = "Vijay Gadepally and Peinan Chen and Rogers, {Jennie M} and Aaron Elmore and Brandon Haynes and Jeremy Kepner and Samuel Madden and Tim Mattson and Michael Stonebraker",
year = "2016",
month = "11",
day = "28",
doi = "10.1109/HPEC.2016.7761636",
language = "English (US)",
series = "2016 IEEE High Performance Extreme Computing Conference, HPEC 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "2016 IEEE High Performance Extreme Computing Conference, HPEC 2016",
address = "United States",

}

Gadepally, V, Chen, P, Rogers, JM, Elmore, A, Haynes, B, Kepner, J, Madden, S, Mattson, T & Stonebraker, M 2016, The BigDAWG polystore system and architecture. in 2016 IEEE High Performance Extreme Computing Conference, HPEC 2016., 7761636, 2016 IEEE High Performance Extreme Computing Conference, HPEC 2016, Institute of Electrical and Electronics Engineers Inc., 2016 IEEE High Performance Extreme Computing Conference, HPEC 2016, Waltham, United States, 9/13/16. https://doi.org/10.1109/HPEC.2016.7761636

The BigDAWG polystore system and architecture. / Gadepally, Vijay; Chen, Peinan; Rogers, Jennie M; Elmore, Aaron; Haynes, Brandon; Kepner, Jeremy; Madden, Samuel; Mattson, Tim; Stonebraker, Michael.

2016 IEEE High Performance Extreme Computing Conference, HPEC 2016. Institute of Electrical and Electronics Engineers Inc., 2016. 7761636 (2016 IEEE High Performance Extreme Computing Conference, HPEC 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - The BigDAWG polystore system and architecture

AU - Gadepally, Vijay

AU - Chen, Peinan

AU - Rogers, Jennie M

AU - Elmore, Aaron

AU - Haynes, Brandon

AU - Kepner, Jeremy

AU - Madden, Samuel

AU - Mattson, Tim

AU - Stonebraker, Michael

PY - 2016/11/28

Y1 - 2016/11/28

N2 - Organizations are often faced with the challenge of providing data management solutions for large, heterogenous datasets that may have different underlying data and programming models. For example, a medical dataset may have unstructured text, relational data, time series waveforms and imagery. Trying to fit such datasets in a single data management system can have adverse performance and efficiency effects. As a part of the Intel Science and Technology Center on Big Data, we are developing a polystore system designed for such problems. BigDAWG (short for the Big Data Analytics Working Group) is a polystore system designed to work on complex problems that naturally span across different processing or storage engines. BigDAWG provides an architecture that supports diverse database systems working with different data models, support for the competing notions of location transparency and semantic completeness via islands and a middleware that provides a uniform multi-island interface. Initial results from a prototype of the BigDAWG system applied to a medical dataset validate polystore concepts. In this article, we will describe polystore databases, the current BigDAWG architecture and its application on the MIMIC II medical dataset, initial performance results and our future development plans.

AB - Organizations are often faced with the challenge of providing data management solutions for large, heterogenous datasets that may have different underlying data and programming models. For example, a medical dataset may have unstructured text, relational data, time series waveforms and imagery. Trying to fit such datasets in a single data management system can have adverse performance and efficiency effects. As a part of the Intel Science and Technology Center on Big Data, we are developing a polystore system designed for such problems. BigDAWG (short for the Big Data Analytics Working Group) is a polystore system designed to work on complex problems that naturally span across different processing or storage engines. BigDAWG provides an architecture that supports diverse database systems working with different data models, support for the competing notions of location transparency and semantic completeness via islands and a middleware that provides a uniform multi-island interface. Initial results from a prototype of the BigDAWG system applied to a medical dataset validate polystore concepts. In this article, we will describe polystore databases, the current BigDAWG architecture and its application on the MIMIC II medical dataset, initial performance results and our future development plans.

UR - http://www.scopus.com/inward/record.url?scp=85007028195&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85007028195&partnerID=8YFLogxK

U2 - 10.1109/HPEC.2016.7761636

DO - 10.1109/HPEC.2016.7761636

M3 - Conference contribution

T3 - 2016 IEEE High Performance Extreme Computing Conference, HPEC 2016

BT - 2016 IEEE High Performance Extreme Computing Conference, HPEC 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Gadepally V, Chen P, Rogers JM, Elmore A, Haynes B, Kepner J et al. The BigDAWG polystore system and architecture. In 2016 IEEE High Performance Extreme Computing Conference, HPEC 2016. Institute of Electrical and Electronics Engineers Inc. 2016. 7761636. (2016 IEEE High Performance Extreme Computing Conference, HPEC 2016). https://doi.org/10.1109/HPEC.2016.7761636