MeDEStrand: An improved method to infer genome-wide absolute methylation levels from DNA enrichment data

Jingting Xu, Shimeng Liu, Ping Yin, Serdar E Bulun, Yang Dai*

*Corresponding author for this work

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: DNA methylation of CpG dinucleotides is an essential epigenetic modification that plays a key role in transcription. Widely used DNA enrichment-based methods offer high coverage for measuring methylated CpG dinucleotides, with the lowest cost per CpG covered genome-wide. However, these methods measure the DNA enrichment of methyl-CpG binding, and thus do not provide information on absolute methylation levels. Further, the enrichment is influenced by various confounding factors in addition to methylation status, for example, CpG density. Computational models that can accurately derive absolute methylation levels from DNA enrichment data are needed. Results: We developed "MeDEStrand," a method that uses a sigmoid function to estimate and correct the CpG bias from enrichment results to infer absolute DNA methylation levels. Unlike previous methods, which estimate CpG bias based on reads mapped at the same genomic loci, MeDEStrand processes the reads for the positive and negative DNA strands separately. We compared the performance of MeDEStrand to that of three other state-of-the-art methods "MEDIPS," "BayMeth," and "QSEA" on four independent datasets generated using immortalized cell lines (GM12878 and K562) and human primary cells (foreskin fibroblasts and mammary epithelial cells). Based on the comparison of the inferred absolute methylation levels from MeDIP-seq data and the corresponding reduced-representation bisulfite sequencing data from each method, MeDEStrand showed the best performance at high resolution of 25, 50, and 100 base pairs. Conclusions: The MeDEStrand tool can be used to infer whole-genome absolute DNA methylation levels at the same cost of enrichment-based methods with adequate accuracy and resolution. R package MeDEStrand and its tutorial is freely available for download at https://github.com/jxu1234/MeDEStrand.git.

Original languageEnglish (US)
Article number540
JournalBMC bioinformatics
Volume19
Issue number1
DOIs
StatePublished - Dec 22 2018

Fingerprint

Methylation
DNA Methylation
Genome
DNA
Genes
Transcription
Fibroblasts
Cell
Costs
Foreskin
Costs and Cost Analysis
Cells
Confounding
Sigmoid Colon
Epigenomics
Base Pairing
Computational Model
Estimate
Sequencing
Genomics

Keywords

  • CpG bias
  • DNA methylation
  • MeDIP-seq
  • RRBS
  • Sigmoid function

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

@article{d6b16ecb6d054e3fb02fd307b8435616,
title = "MeDEStrand: An improved method to infer genome-wide absolute methylation levels from DNA enrichment data",
abstract = "Background: DNA methylation of CpG dinucleotides is an essential epigenetic modification that plays a key role in transcription. Widely used DNA enrichment-based methods offer high coverage for measuring methylated CpG dinucleotides, with the lowest cost per CpG covered genome-wide. However, these methods measure the DNA enrichment of methyl-CpG binding, and thus do not provide information on absolute methylation levels. Further, the enrichment is influenced by various confounding factors in addition to methylation status, for example, CpG density. Computational models that can accurately derive absolute methylation levels from DNA enrichment data are needed. Results: We developed {"}MeDEStrand,{"} a method that uses a sigmoid function to estimate and correct the CpG bias from enrichment results to infer absolute DNA methylation levels. Unlike previous methods, which estimate CpG bias based on reads mapped at the same genomic loci, MeDEStrand processes the reads for the positive and negative DNA strands separately. We compared the performance of MeDEStrand to that of three other state-of-the-art methods {"}MEDIPS,{"} {"}BayMeth,{"} and {"}QSEA{"} on four independent datasets generated using immortalized cell lines (GM12878 and K562) and human primary cells (foreskin fibroblasts and mammary epithelial cells). Based on the comparison of the inferred absolute methylation levels from MeDIP-seq data and the corresponding reduced-representation bisulfite sequencing data from each method, MeDEStrand showed the best performance at high resolution of 25, 50, and 100 base pairs. Conclusions: The MeDEStrand tool can be used to infer whole-genome absolute DNA methylation levels at the same cost of enrichment-based methods with adequate accuracy and resolution. R package MeDEStrand and its tutorial is freely available for download at https://github.com/jxu1234/MeDEStrand.git.",
keywords = "CpG bias, DNA methylation, MeDIP-seq, RRBS, Sigmoid function",
author = "Jingting Xu and Shimeng Liu and Ping Yin and Bulun, {Serdar E} and Yang Dai",
year = "2018",
month = "12",
day = "22",
doi = "10.1186/s12859-018-2574-7",
language = "English (US)",
volume = "19",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

MeDEStrand : An improved method to infer genome-wide absolute methylation levels from DNA enrichment data. / Xu, Jingting; Liu, Shimeng; Yin, Ping; Bulun, Serdar E; Dai, Yang.

In: BMC bioinformatics, Vol. 19, No. 1, 540, 22.12.2018.

Research output: Contribution to journalArticle

TY - JOUR

T1 - MeDEStrand

T2 - An improved method to infer genome-wide absolute methylation levels from DNA enrichment data

AU - Xu, Jingting

AU - Liu, Shimeng

AU - Yin, Ping

AU - Bulun, Serdar E

AU - Dai, Yang

PY - 2018/12/22

Y1 - 2018/12/22

N2 - Background: DNA methylation of CpG dinucleotides is an essential epigenetic modification that plays a key role in transcription. Widely used DNA enrichment-based methods offer high coverage for measuring methylated CpG dinucleotides, with the lowest cost per CpG covered genome-wide. However, these methods measure the DNA enrichment of methyl-CpG binding, and thus do not provide information on absolute methylation levels. Further, the enrichment is influenced by various confounding factors in addition to methylation status, for example, CpG density. Computational models that can accurately derive absolute methylation levels from DNA enrichment data are needed. Results: We developed "MeDEStrand," a method that uses a sigmoid function to estimate and correct the CpG bias from enrichment results to infer absolute DNA methylation levels. Unlike previous methods, which estimate CpG bias based on reads mapped at the same genomic loci, MeDEStrand processes the reads for the positive and negative DNA strands separately. We compared the performance of MeDEStrand to that of three other state-of-the-art methods "MEDIPS," "BayMeth," and "QSEA" on four independent datasets generated using immortalized cell lines (GM12878 and K562) and human primary cells (foreskin fibroblasts and mammary epithelial cells). Based on the comparison of the inferred absolute methylation levels from MeDIP-seq data and the corresponding reduced-representation bisulfite sequencing data from each method, MeDEStrand showed the best performance at high resolution of 25, 50, and 100 base pairs. Conclusions: The MeDEStrand tool can be used to infer whole-genome absolute DNA methylation levels at the same cost of enrichment-based methods with adequate accuracy and resolution. R package MeDEStrand and its tutorial is freely available for download at https://github.com/jxu1234/MeDEStrand.git.

AB - Background: DNA methylation of CpG dinucleotides is an essential epigenetic modification that plays a key role in transcription. Widely used DNA enrichment-based methods offer high coverage for measuring methylated CpG dinucleotides, with the lowest cost per CpG covered genome-wide. However, these methods measure the DNA enrichment of methyl-CpG binding, and thus do not provide information on absolute methylation levels. Further, the enrichment is influenced by various confounding factors in addition to methylation status, for example, CpG density. Computational models that can accurately derive absolute methylation levels from DNA enrichment data are needed. Results: We developed "MeDEStrand," a method that uses a sigmoid function to estimate and correct the CpG bias from enrichment results to infer absolute DNA methylation levels. Unlike previous methods, which estimate CpG bias based on reads mapped at the same genomic loci, MeDEStrand processes the reads for the positive and negative DNA strands separately. We compared the performance of MeDEStrand to that of three other state-of-the-art methods "MEDIPS," "BayMeth," and "QSEA" on four independent datasets generated using immortalized cell lines (GM12878 and K562) and human primary cells (foreskin fibroblasts and mammary epithelial cells). Based on the comparison of the inferred absolute methylation levels from MeDIP-seq data and the corresponding reduced-representation bisulfite sequencing data from each method, MeDEStrand showed the best performance at high resolution of 25, 50, and 100 base pairs. Conclusions: The MeDEStrand tool can be used to infer whole-genome absolute DNA methylation levels at the same cost of enrichment-based methods with adequate accuracy and resolution. R package MeDEStrand and its tutorial is freely available for download at https://github.com/jxu1234/MeDEStrand.git.

KW - CpG bias

KW - DNA methylation

KW - MeDIP-seq

KW - RRBS

KW - Sigmoid function

UR - http://www.scopus.com/inward/record.url?scp=85058911995&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058911995&partnerID=8YFLogxK

U2 - 10.1186/s12859-018-2574-7

DO - 10.1186/s12859-018-2574-7

M3 - Article

C2 - 30577750

AN - SCOPUS:85058911995

VL - 19

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 540

ER -