Randomized Algorithm for Approximate Nearest Neighbor Search in High Dimensions

Ruben Buaba, Abdollah Homaifar, William Hendrix, Seung Woo Son, Wei-Keng Liao, Alok Nidhi Choudhary

Research output: Contribution to journalArticle

Abstract

A randomized algorithm employs a degree of randomness as part of its logic with uniform random bits as an auxiliary input to guide its behavior, in the hope of achieving good average runtime performance over all possible choices of the random bits. In this paper, we formulate a randomized algorithm capable of nding approximate nearest neighbors, specically in high dimensional datasets. The random bits of this algorithm are sets of kernels chosen from the Gaussian normal distribution, which are used to create a data structure that guarantees sublinear runtime complexity and retrieval accuracy. The algorithm focuses on selecting the optimal cardinalities of the kernels and their members. These cardinalities influence computational, memory and runtime complexities of the algorithm. We demonstrate that using the cardinality of the kernels to determine the kernel size guarantees the highest provable lower bound of the probability of collision of related items; thus accurate retrieval of nearest neighbors. When implemented on Cray XE6 ma- chine using 76 processes, our algorithm achieves speedup of 69 and approximately 48x performance gain in average query runtime with the retrieval accuracy of 90.5%.
Original languageEnglish (US)
JournalJournal of Pattern Recognition Research
Volume9
Issue number1
DOIs
StatePublished - 2014

Fingerprint Dive into the research topics of 'Randomized Algorithm for Approximate Nearest Neighbor Search in High Dimensions'. Together they form a unique fingerprint.

Cite this