Abstract
Biclustering refers to simultaneous clustering of objects and their features. Use of biclustering is gaining momentum in areas such as text mining, gene expression analysis and collaborative filtering. Due to requirements for high performance in large scale data processing applications such as Collaborative filtering in E-commerce systems and large scale genome-wide gene expression analysis in microarray experiments, a high performance prallel/distributed solution for biclustering problem is highly desirable. Recently, Ahmad et al [1] showed that Bipartite Spectral Partitioning, which is a popular technique for biclustering, can be reformulated as a graph drawing problem where objective is to minimize Hall's energy of the bipartite graph representation of the input data. They showed that optimal solution to this problem is achieved when nodes are placed at the barycenter of their neighbors. In this paper, we provide a parallel algorithm for biclustering based on this formulation. We show that parallel energy minimization using barycenter heuristic is embarrassingly parallel. The challenge is to design a bicluster identification algorithm which is scalable as well as accurate. We show that our parallel implementation is not just extremely scalable, it is comparable in accuracy as well with serial implementation. We have evaluated proposed parallel biclustering algorithm with large synthetic data sets on upto 256 processors. Experimental evaluation shows large superlinear speedups, scalability and high level of accuracy.
Original language | English (US) |
---|---|
Title of host publication | Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133 |
Pages | 1045-1056 |
Number of pages | 12 |
Volume | 2 |
State | Published - Dec 31 2009 |
Event | 9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States Duration: Apr 30 2009 → May 2 2009 |
Other
Other | 9th SIAM International Conference on Data Mining 2009, SDM 2009 |
---|---|
Country/Territory | United States |
City | Sparks, NV |
Period | 4/30/09 → 5/2/09 |
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Software
- Applied Mathematics