Out-of-memory problem was frequently encountered when processing thousands of CEL files using Bioconductor. We propose a divide-and-conquer strategy combined with randomised resampling to solve this problem. The CAMDA 2007 META-analysis data set which contains 5896 CEL files was used to test the approach on a typical commodity computer cluster by running established pre-processing algorithms for Affymetrix arrays in the Bioconductor package. The results were validated against a golden standard obtained by using a supercomputer. In addition to the performance improvement, the general divide-and-conquer strategy can be applied to any other normalisation algorithms without modifying the underlying implementation.
|Original language||English (US)|
|Number of pages||10|
|Journal||International Journal of Computational Biology and Drug Design|
|State||Published - 2008|
Lee, C-H., Fu, D., Du, P., Jiang, H., Lin, S. M., & Kibbe, W. (2008). A divide-and-conquer strategy to solve the out-of-memory problem of processing thousands of Affymetrix microarrays. International Journal of Computational Biology and Drug Design, 1(4), 396-405.