TY - GEN
T1 - Predictive modeling and scalability analysis for large graph analytics
AU - Medya, Sourav
AU - Cherkasova, Ludmila
AU - Singh, Ambuj
N1 - Funding Information:
VIII. ACKNOWLEDGMENTS Research was partially supported by the IIS-1219254 grant from the NSF.
Publisher Copyright:
© 2017 IFIP.
PY - 2017/7/20
Y1 - 2017/7/20
N2 - Many HPC and modern large graph processing applications belong to a class of scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Assessing the application scalability is one of the primary goals during such application implementation. Typically, in the design phase, programmers are limited by a small size cluster available for their experiments. Therefore, predictive modeling is required for the analysis of the application scalability and its performance in a larger cluster. While in an increased size cluster, each node will process a smaller portion of the original dataset, a higher communication volume between a larger number of nodes may cripple the application scalability and provide diminishing performance benefits. One of the main challenges is the analysis of bandwidth demands due to an increased communication volume in a larger size cluster. In this paper1, we introduce a novel regression-based approach to assess the scalability and performance of a distributed memory program for execution in a large-scale cluster. Our solution involves 1) a limited set of traditional experiments performed in a small size cluster and 2) an additional set of similar experiments performed with an "interconnect bandwidth throttling" tool, which exposes the bandwidth impact on the application performance. These measurements are used in creating an ensemble of analytical models for performance and scalability analysis. Using a linear regression approach, step by step, we incorporate into the model the following important parameters: i) the number of cluster nodes and application processes, ii) the dataset size, and iii) interconnect bandwidth. We demonstrate our solution, its power, and accuracy using a popular Graph500 benchmark, which implements a Breadth First Search algorithm on large, synthetically generated graphs. By utilizing measurements collected in a 32-node cluster, we are able to project the program performance in a large size cluster with hundreds of nodes. The proposed approach and derived models help to provide an early feedback to programmers on the scalability and efficiency of their solution.
AB - Many HPC and modern large graph processing applications belong to a class of scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Assessing the application scalability is one of the primary goals during such application implementation. Typically, in the design phase, programmers are limited by a small size cluster available for their experiments. Therefore, predictive modeling is required for the analysis of the application scalability and its performance in a larger cluster. While in an increased size cluster, each node will process a smaller portion of the original dataset, a higher communication volume between a larger number of nodes may cripple the application scalability and provide diminishing performance benefits. One of the main challenges is the analysis of bandwidth demands due to an increased communication volume in a larger size cluster. In this paper1, we introduce a novel regression-based approach to assess the scalability and performance of a distributed memory program for execution in a large-scale cluster. Our solution involves 1) a limited set of traditional experiments performed in a small size cluster and 2) an additional set of similar experiments performed with an "interconnect bandwidth throttling" tool, which exposes the bandwidth impact on the application performance. These measurements are used in creating an ensemble of analytical models for performance and scalability analysis. Using a linear regression approach, step by step, we incorporate into the model the following important parameters: i) the number of cluster nodes and application processes, ii) the dataset size, and iii) interconnect bandwidth. We demonstrate our solution, its power, and accuracy using a popular Graph500 benchmark, which implements a Breadth First Search algorithm on large, synthetically generated graphs. By utilizing measurements collected in a 32-node cluster, we are able to project the program performance in a large size cluster with hundreds of nodes. The proposed approach and derived models help to provide an early feedback to programmers on the scalability and efficiency of their solution.
UR - http://www.scopus.com/inward/record.url?scp=85029453663&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029453663&partnerID=8YFLogxK
U2 - 10.23919/INM.2017.7987265
DO - 10.23919/INM.2017.7987265
M3 - Conference contribution
AN - SCOPUS:85029453663
T3 - Proceedings of the IM 2017 - 2017 IFIP/IEEE International Symposium on Integrated Network and Service Management
SP - 63
EP - 71
BT - Proceedings of the IM 2017 - 2017 IFIP/IEEE International Symposium on Integrated Network and Service Management
A2 - Chemouil, Prosper
A2 - Simoes, Paulo
A2 - Madeira, Edmundo
A2 - Secci, Stefano
A2 - Monteiro, Edmundo
A2 - Gaspary, Luciano Paschoal
A2 - dos Santos, Carlos Raniery P.
A2 - Charalambides, Marinos
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IFIP/IEEE International Symposium on Integrated Network and Service Management, IM 2017
Y2 - 8 May 2017 through 12 May 2017
ER -