Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions

Kenneth Jordan Mccallum*, Jiping Wang

*Corresponding author for this work

Research output: Contribution to journalArticle

7 Scopus citations

Abstract

Copy number variations (CNVs) are a significant source of genetic variation and have been found frequently associated with diseases such as cancers and autism. High-throughput sequencing data are increasingly being used to detect and quantify CNVs; however, the distributional properties of the data are not fully understood. A hidden Markov model (HMM) is proposed using inhomogeneous emission distributions based on negative binomial regression to account for the sequencing biases. The model is tested on the whole genome sequencing data and simulated data sets. An algorithm for CNV detection is implemented in the R package CNVfinder. The model based on negative binomial regression is shown to provide a good fit to the data and provides competitive performance compared with methods based on normalization of read counts.

Original languageEnglish (US)
Pages (from-to)600-611
Number of pages12
JournalBiostatistics
Volume14
Issue number3
DOIs
StatePublished - Jul 1 2013

Keywords

  • Copy number variation
  • Hidden Markov chain
  • High-throughput sequencing
  • Negative binomial regression
  • Sequence read depth

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint Dive into the research topics of 'Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions'. Together they form a unique fingerprint.

  • Cite this