Clustering, or unsupervised classification, has many uses in fields that depend on grouping results from large amount of data, an example being the N-body cosmological simulation in astrophysics. In this paper, we study a particular clustering algorithm used in astrophysics, called HOP, and present a parallel implementation to speed up its current sequential implementation. Our approach first builds in parallel the spatial domain hierarchical data structure, a three-dimensional KD tree. Using a KD tree, the core of the HOP algorithm that searches for the highest density neighbor can be performed using only subsets of the particles and hence the communication cost is reduced. We evaluate our implementation by using data sets from a production cosmological application. The experimental results demonstrate up to 24× speedup using 64 processors on three parallel processing machines.