Abstract
Studies have revealed dozens of functional peptides in putative ‘noncoding’ regions and raised the question of how many proteins are encoded by noncanonical open reading frames (ORFs). Here, we comprehensively annotate genome-wide translated ORFs across five eukaryotes (human, mouse, zebrafish, worm, and yeast) by analyzing ribosome profiling data. We develop a logistic regression model named PepScore based on ORF features (expected length, encoded domain, and conservation) to calculate the probability that the encoded peptide is stable in humans. Systematic ectopic expression validates PepScore and shows that stable complex-associating microproteins can be encoded in 5’/3’ untranslated regions and overlapping coding regions of mRNAs besides annotated noncoding RNAs. Stable noncanonical proteins follow conventional rules and localize to different subcellular compartments. Inhibition of proteasomal/lysosomal degradation pathways can stabilize some peptides especially those with moderate PepScores, but cannot rescue the expression of short ones with low PepScores suggesting they are directly degraded by cellular proteases. The majority of human noncanonical peptides with high PepScores show longer lengths but low conservation across species/mammals, and hundreds contain trait-associated genetic variants. Our study presents a statistical framework to identify stable noncanonical peptides in the genome and provides a valuable resource for functional characterization of noncanonical translation during development and disease.
Original language | English (US) |
---|---|
Article number | 1932 |
Journal | Nature communications |
Volume | 15 |
Issue number | 1 |
DOIs | |
State | Published - Dec 2024 |
Funding
This work was supported by grants to Z.J.: the National Institutes of Health (R35GM138192, R01HL161389, and R00CA207865), and the Lynn Sage Scholar fund. E.S. was supported by the Predoctoral Training Program in Biomedical Data Driven Discovery (T32LM012203). Proteomics services were performed by the Northwestern Proteomics Core Facility, supported by NCI CCSG P30 CA060553 awarded to the Robert H Lurie Comprehensive Cancer Center, instrumentation award (S10OD025194) from NIH Office of Director, and the National Resource for Translational and Developmental Proteomics supported by P41 GM108569. We thank the members of the Ji lab for helpful discussions.
ASJC Scopus subject areas
- General Chemistry
- General Biochemistry, Genetics and Molecular Biology
- General Physics and Astronomy