TY - GEN
T1 - Minimizing communication cost in distributed multi-query processing
AU - Li, Jian
AU - Deshpande, Amol
AU - Khuller, Samir
PY - 2009
Y1 - 2009
N2 - Increasing prevalence of large-scale distributed monitoring and computing environments such as sensor networks, scientific federations, Grids etc., has led to a renewed interest in the area of distributed query processing and optimization. In this paper we address a general, distributed multiquery processing problem motivated by the need to minimize the communication cost in these environments. Specifically we address the problem of optimally sharing data movement across the communication edges in a distributed communication network given a set of overlapping queries and query plans for them (specifying the operations to be executed). Most of the problem variations of our general problem can be shown to be NP-Hard by a reduction from the Steiner tree problem. However, we show that the problem can be solved optimally if the communication network is a tree, and present a novel algorithm for finding an optimal data movement plan. For general communication networks, we present efficient approximation algorithms for several variations of the problem. Finally, we present an experimental study over synthetic datasets showing both the need for exploiting the sharing of data movement and the effectiveness of our algorithms at finding such plans.
AB - Increasing prevalence of large-scale distributed monitoring and computing environments such as sensor networks, scientific federations, Grids etc., has led to a renewed interest in the area of distributed query processing and optimization. In this paper we address a general, distributed multiquery processing problem motivated by the need to minimize the communication cost in these environments. Specifically we address the problem of optimally sharing data movement across the communication edges in a distributed communication network given a set of overlapping queries and query plans for them (specifying the operations to be executed). Most of the problem variations of our general problem can be shown to be NP-Hard by a reduction from the Steiner tree problem. However, we show that the problem can be solved optimally if the communication network is a tree, and present a novel algorithm for finding an optimal data movement plan. For general communication networks, we present efficient approximation algorithms for several variations of the problem. Finally, we present an experimental study over synthetic datasets showing both the need for exploiting the sharing of data movement and the effectiveness of our algorithms at finding such plans.
UR - http://www.scopus.com/inward/record.url?scp=67649661838&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67649661838&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2009.85
DO - 10.1109/ICDE.2009.85
M3 - Conference contribution
AN - SCOPUS:67649661838
SN - 9780769535456
T3 - Proceedings - International Conference on Data Engineering
SP - 772
EP - 783
BT - Proceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009
T2 - 25th IEEE International Conference on Data Engineering, ICDE 2009
Y2 - 29 March 2009 through 2 April 2009
ER -