TY - GEN
T1 - Incremental elasticity for array databases
AU - Duggan, Jennie
AU - Stonebraker, Michael
PY - 2014
Y1 - 2014
N2 - Relational databases benefit significantly from elasticity, whereby they execute on a set of changing hardware resources provisioned to match their storage and processing requirements. Such flexibility is especially attractive for scientific databases because their users often have a no-overwrite storage model, in which they delete data only when their available space is exhausted. This results in a database that is regularly growing and expanding its hardware proportionally. Also, scientific databases frequently store their data as multidimensional arrays optimized for spatial querying. This brings about several novel challenges in clustered, skewaware data placement on an elastic shared-nothing database. In this work, we design and implement elasticity for an array database. We address this challenge on two fronts: determining when to expand a database cluster and how to partition the data within it. In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance. We introduce an algorithm for gradually augmenting an array database's hardware using a closed-loop control system. After the cluster adds nodes, we optimize data placement for n-dimensional arrays. Many of our elastic partitioners incrementally reorganize an array, redistributing data only to new nodes. By combining these two tools, the scientific database efficiently and seamlessly manages its monotonically increasing hardware resources.
AB - Relational databases benefit significantly from elasticity, whereby they execute on a set of changing hardware resources provisioned to match their storage and processing requirements. Such flexibility is especially attractive for scientific databases because their users often have a no-overwrite storage model, in which they delete data only when their available space is exhausted. This results in a database that is regularly growing and expanding its hardware proportionally. Also, scientific databases frequently store their data as multidimensional arrays optimized for spatial querying. This brings about several novel challenges in clustered, skewaware data placement on an elastic shared-nothing database. In this work, we design and implement elasticity for an array database. We address this challenge on two fronts: determining when to expand a database cluster and how to partition the data within it. In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance. We introduce an algorithm for gradually augmenting an array database's hardware using a closed-loop control system. After the cluster adds nodes, we optimize data placement for n-dimensional arrays. Many of our elastic partitioners incrementally reorganize an array, redistributing data only to new nodes. By combining these two tools, the scientific database efficiently and seamlessly manages its monotonically increasing hardware resources.
UR - http://www.scopus.com/inward/record.url?scp=84904359199&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904359199&partnerID=8YFLogxK
U2 - 10.1145/2588555.2588569
DO - 10.1145/2588555.2588569
M3 - Conference contribution
AN - SCOPUS:84904359199
SN - 9781450323765
SN - 9781450323765
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 409
EP - 420
BT - SIGMOD 2014 - Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014
Y2 - 22 June 2014 through 27 June 2014
ER -