TY - GEN
T1 - GPU-accelerated Monte Carlo simulations of dense stellar systems
AU - Pattabiraman, Bharath
AU - Umbreit, Stefan
AU - Liao, Wei-Keng
AU - Rasio, Frederic A
AU - Kalogera, Vicky
AU - Memik, Gokhan
AU - Choudhary, Alok Nidhi
PY - 2012
Y1 - 2012
N2 - Computing the interactions between the stars within dense stellar clusters is a problem of fundamental importance in theoretical astrophysics. However, simulating realistic sized clusters of about 106 stars is computationally intensive and often takes a long time to complete. This paper presents the parallelization of a Monte Carlo method-based algorithm for simulating stellar cluster evolution on programmable Graphics Processing Units (GPUs). The kernels of this algorithm involve numerical methods of root-bisection and von Neumann rejection. Our experiments show that although these kernels exhibit data dependent decision making and unavoidable non-contiguous memory accesses, the GPU can still deliver substantial near-linear speed-ups which is unlikely to be achieved on a CPU-based system. For problem sizes ranging from 106 to 7 × 106 stars, we obtain up to 28x speedups for these kernels, and a 2x overall application speedup on an NVIDIA GTX280 GPU over the sequential version run on an AMD
AB - Computing the interactions between the stars within dense stellar clusters is a problem of fundamental importance in theoretical astrophysics. However, simulating realistic sized clusters of about 106 stars is computationally intensive and often takes a long time to complete. This paper presents the parallelization of a Monte Carlo method-based algorithm for simulating stellar cluster evolution on programmable Graphics Processing Units (GPUs). The kernels of this algorithm involve numerical methods of root-bisection and von Neumann rejection. Our experiments show that although these kernels exhibit data dependent decision making and unavoidable non-contiguous memory accesses, the GPU can still deliver substantial near-linear speed-ups which is unlikely to be achieved on a CPU-based system. For problem sizes ranging from 106 to 7 × 106 stars, we obtain up to 28x speedups for these kernels, and a 2x overall application speedup on an NVIDIA GTX280 GPU over the sequential version run on an AMD
KW - CUDA
KW - Graphics processing unit (GPU)
KW - Monte Carlo simulation
KW - bisection method
KW - multi-scale simulation
KW - parallel random number generator
UR - http://www.scopus.com/inward/record.url?scp=84870698656&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870698656&partnerID=8YFLogxK
U2 - 10.1109/InPar.2012.6339600
DO - 10.1109/InPar.2012.6339600
M3 - Conference contribution
AN - SCOPUS:84870698656
SN - 9781467326322
T3 - 2012 Innovative Parallel Computing, InPar 2012
BT - 2012 Innovative Parallel Computing, InPar 2012
T2 - 2012 Innovative Parallel Computing, InPar 2012
Y2 - 13 May 2012 through 14 May 2012
ER -