The C-Score: A bayesian framework to sharply improve proteoform scoring in high-throughput top down proteomics

Richard D. Leduc*, Ryan T. Fellers, Bryan P. Early, Joseph B. Greer, Paul M. Thomas, Neil L. Kelleher

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

56 Scopus citations

Abstract

The automated processing of data generated by top down proteomics would benefit from improved scoring for protein identification and characterization of highly related protein forms (proteoforms). Here we propose the "C-score" (short for Characterization Score), a Bayesian approach to the proteoform identification and characterization problem, implemented within a framework to allow the infusion of expert knowledge into generative models that take advantage of known properties of proteins and top down analytical systems (e.g., fragmentation propensities, "off-by-1 Da" discontinuous errors, and intelligent weighting for site-specific modifications). The performance of the scoring system based on the initial generative models was compared to the current probability-based scoring system used within both ProSightPC and ProSightPTM on a manually curated set of 295 human proteoforms. The current implementation of the C-score framework generated a marked improvement over the existing scoring system as measured by the area under the curve on the resulting ROC chart (AUC of 0.99 versus 0.78).

Original languageEnglish (US)
Pages (from-to)3231-3240
Number of pages10
JournalJournal of Proteome Research
Volume13
Issue number7
DOIs
StatePublished - Jul 3 2014

Keywords

  • Bayesian scoring
  • proteoform characterization
  • top down proteomics

ASJC Scopus subject areas

  • Biochemistry
  • Chemistry(all)

Fingerprint

Dive into the research topics of 'The C-Score: A bayesian framework to sharply improve proteoform scoring in high-throughput top down proteomics'. Together they form a unique fingerprint.

Cite this