Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions

Kenneth Jordan Mccallum*, Jiping Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


Copy number variations (CNVs) are a significant source of genetic variation and have been found frequently associated with diseases such as cancers and autism. High-throughput sequencing data are increasingly being used to detect and quantify CNVs; however, the distributional properties of the data are not fully understood. A hidden Markov model (HMM) is proposed using inhomogeneous emission distributions based on negative binomial regression to account for the sequencing biases. The model is tested on the whole genome sequencing data and simulated data sets. An algorithm for CNV detection is implemented in the R package CNVfinder. The model based on negative binomial regression is shown to provide a good fit to the data and provides competitive performance compared with methods based on normalization of read counts.

Original languageEnglish (US)
Pages (from-to)600-611
Number of pages12
Issue number3
StatePublished - Jul 1 2013


  • Copy number variation
  • Hidden Markov chain
  • High-throughput sequencing
  • Negative binomial regression
  • Sequence read depth

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions'. Together they form a unique fingerprint.

Cite this