pFANGS: Parallel high speed sequence mapping for next generation 454-roche sequencing reads

Sanchit Misra*, Ramanathan Narayanan, Wei Keng Liao, Alok Choudhary, Simon Lin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Millions of DNA sequences (reads) are generated by Next Generation Sequencing machines everyday. There is a need for high performance algorithms to map these sequences to the reference genome to identify single nucleotide polymorphisms or rare transcripts to fulfill the dream of personalized medicine. In this paper, we present a high-throughput parallel sequence mapping program pFANGS. pFANGS is designed to find all the matches of a query sequence in the reference genome tolerating a large number of mismatches or insertions/ deletions. pFANGS partitions the computational workload and data among all the processes and employs loadbalancing mechanisms to ensure better process efficiency. Our experiments show that, with 512 processors, we are able to map approximately 31 million 454/Roche queries of length 500 each to a reference human genome per hour allowing 5 mismatches or insertion/deletions at full sensitivity. We also report and compare the performance results of two alternative parallel implementations of pFANGS: a shared memory OpenMP implementation and a MPI-OpenMP hybrid implementation.

Original languageEnglish (US)
Title of host publicationProceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
DOIs
StatePublished - Jul 2 2010
Event2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010 - Atlanta, GA, United States
Duration: Apr 19 2010Apr 23 2010

Publication series

NameProceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010

Other

Other2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2010
CountryUnited States
CityAtlanta, GA
Period4/19/104/23/10

Keywords

  • 454 sequencers
  • Next generation sequencers
  • Parallel computing
  • Sequence mapping

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'pFANGS: Parallel high speed sequence mapping for next generation 454-roche sequencing reads'. Together they form a unique fingerprint.

Cite this