Accurate estimation of context-dependent false discovery rates in top-down proteomics

Richard D. LeDuc, Ryan T. Fellers, Bryan P. Early, Joseph B. Greer, Daniel P. Shams, Paul Martin Thomas, Neil L Kelleher*

*Corresponding author for this work

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD FDR Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.

Original languageEnglish (US)
Pages (from-to)796-805
Number of pages10
JournalMolecular and Cellular Proteomics
Volume18
Issue number4
DOIs
StatePublished - Jan 1 2019

Fingerprint

Proteomics
Search Engine
Search engines
Protein Isoforms
Proteins
Throughput
Databases

ASJC Scopus subject areas

  • Analytical Chemistry
  • Biochemistry
  • Molecular Biology

Cite this

LeDuc, Richard D. ; Fellers, Ryan T. ; Early, Bryan P. ; Greer, Joseph B. ; Shams, Daniel P. ; Thomas, Paul Martin ; Kelleher, Neil L. / Accurate estimation of context-dependent false discovery rates in top-down proteomics. In: Molecular and Cellular Proteomics. 2019 ; Vol. 18, No. 4. pp. 796-805.
@article{fbbba5e36fa649d89bb48cf63e9c6a65,
title = "Accurate estimation of context-dependent false discovery rates in top-down proteomics",
abstract = "Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD FDR Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.",
author = "LeDuc, {Richard D.} and Fellers, {Ryan T.} and Early, {Bryan P.} and Greer, {Joseph B.} and Shams, {Daniel P.} and Thomas, {Paul Martin} and Kelleher, {Neil L}",
year = "2019",
month = "1",
day = "1",
doi = "10.1074/mcp.RA118.000993",
language = "English (US)",
volume = "18",
pages = "796--805",
journal = "Molecular and Cellular Proteomics",
issn = "1535-9476",
publisher = "American Society for Biochemistry and Molecular Biology Inc.",
number = "4",

}

Accurate estimation of context-dependent false discovery rates in top-down proteomics. / LeDuc, Richard D.; Fellers, Ryan T.; Early, Bryan P.; Greer, Joseph B.; Shams, Daniel P.; Thomas, Paul Martin; Kelleher, Neil L.

In: Molecular and Cellular Proteomics, Vol. 18, No. 4, 01.01.2019, p. 796-805.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Accurate estimation of context-dependent false discovery rates in top-down proteomics

AU - LeDuc, Richard D.

AU - Fellers, Ryan T.

AU - Early, Bryan P.

AU - Greer, Joseph B.

AU - Shams, Daniel P.

AU - Thomas, Paul Martin

AU - Kelleher, Neil L

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD FDR Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.

AB - Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD FDR Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.

UR - http://www.scopus.com/inward/record.url?scp=85064123868&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064123868&partnerID=8YFLogxK

U2 - 10.1074/mcp.RA118.000993

DO - 10.1074/mcp.RA118.000993

M3 - Article

C2 - 30647073

AN - SCOPUS:85064123868

VL - 18

SP - 796

EP - 805

JO - Molecular and Cellular Proteomics

JF - Molecular and Cellular Proteomics

SN - 1535-9476

IS - 4

ER -