Abstract
Copy number variations (CNVs) are a significant source of genetic variation and have been found frequently associated with diseases such as cancers and autism. High-throughput sequencing data are increasingly being used to detect and quantify CNVs; however, the distributional properties of the data are not fully understood. A hidden Markov model (HMM) is proposed using inhomogeneous emission distributions based on negative binomial regression to account for the sequencing biases. The model is tested on the whole genome sequencing data and simulated data sets. An algorithm for CNV detection is implemented in the R package CNVfinder. The model based on negative binomial regression is shown to provide a good fit to the data and provides competitive performance compared with methods based on normalization of read counts.
Original language | English (US) |
---|---|
Pages (from-to) | 600-611 |
Number of pages | 12 |
Journal | Biostatistics |
Volume | 14 |
Issue number | 3 |
DOIs | |
State | Published - Jul 2013 |
Keywords
- Copy number variation
- Hidden Markov chain
- High-throughput sequencing
- Negative binomial regression
- Sequence read depth
ASJC Scopus subject areas
- General Medicine