TY - JOUR
T1 - Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis
AU - Kelkar, Dhanashree S.
AU - Provost, Elayne
AU - Chaerkady, Raghothama
AU - Muthusamy, Babylakshmi
AU - Manda, Srikanth S.
AU - Subbannayya, Tejaswini
AU - Selvan, Lakshmi Dhevi N.
AU - Wang, Chieh Huei
AU - Datta, Keshava K.
AU - Woo, Sunghee
AU - Dwivedi, Sutopa B.
AU - Renuse, Santosh
AU - Getnet, Derese
AU - Huang, Tai Chung
AU - Kim, Min Sik
AU - Pinto, Sneha M.
AU - Mitchell, Christopher J.
AU - Madugundu, Anil K.
AU - Kumar, Praveen
AU - Sharma, Jyoti
AU - Advani, Jayshree
AU - Dey, Gourav
AU - Balakrishnan, Lavanya
AU - Syed, Nazia
AU - Nanjappa, Vishalakshi
AU - Subbannayya, Yashwanth
AU - Goel, Renu
AU - Prasad, T. S.Keshava
AU - Bafna, Vineet
AU - Sirdeshmukh, Ravi
AU - Gowda, Harsha
AU - Wangbc, Charles
AU - Leach, Steven D.
AU - Pandey, Akhilesh
N1 - Publisher Copyright:
© 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
PY - 2014/11/1
Y1 - 2014/11/1
N2 - Accurate annotation of protein-coding genes is one of the primary tasks upon the completion of whole genome sequencing of any organism. In this study, we used an integrated transcriptomic and proteomic strategy to validate and improve the existing zebrafish genome annotation. We undertook high-resolution mass-spectrometry-based proteomic profiling of 10 adult organs, whole adult fish body, and two developmental stages of zebrafish (SAT line), in addition to transcriptomic profiling of six organs. More than 7,000 proteins were identified from proteomic analyses, and ~69,000 high-confidence transcripts were assembled from the RNA sequencing data. Approximately 15% of the transcripts mapped to intergenic regions, the majority of which are likely long non-coding RNAs. These high-quality transcriptomic and proteomic data were used to manually reannotate the zebrafish genome. We report the identification of 157 novel protein-coding genes. In addition, our data led to modification of existing gene structures including novel exons, changes in exon coordinates, changes in frame of translation, translation in annotated UTRs, and joining of genes. Finally, we discovered four instances of genome assembly errors that were supported by both proteomic and transcriptomic data. Our study shows how an integrative analysis of the transcriptome and the proteome can extend our understanding of even well-annotated genomes.
AB - Accurate annotation of protein-coding genes is one of the primary tasks upon the completion of whole genome sequencing of any organism. In this study, we used an integrated transcriptomic and proteomic strategy to validate and improve the existing zebrafish genome annotation. We undertook high-resolution mass-spectrometry-based proteomic profiling of 10 adult organs, whole adult fish body, and two developmental stages of zebrafish (SAT line), in addition to transcriptomic profiling of six organs. More than 7,000 proteins were identified from proteomic analyses, and ~69,000 high-confidence transcripts were assembled from the RNA sequencing data. Approximately 15% of the transcripts mapped to intergenic regions, the majority of which are likely long non-coding RNAs. These high-quality transcriptomic and proteomic data were used to manually reannotate the zebrafish genome. We report the identification of 157 novel protein-coding genes. In addition, our data led to modification of existing gene structures including novel exons, changes in exon coordinates, changes in frame of translation, translation in annotated UTRs, and joining of genes. Finally, we discovered four instances of genome assembly errors that were supported by both proteomic and transcriptomic data. Our study shows how an integrative analysis of the transcriptome and the proteome can extend our understanding of even well-annotated genomes.
UR - http://www.scopus.com/inward/record.url?scp=84910647485&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84910647485&partnerID=8YFLogxK
U2 - 10.1074/mcp.M114.038299
DO - 10.1074/mcp.M114.038299
M3 - Article
C2 - 25060758
AN - SCOPUS:84910647485
SN - 1535-9476
VL - 13
SP - 3184
EP - 3198
JO - Molecular and Cellular Proteomics
JF - Molecular and Cellular Proteomics
IS - 11
ER -