A FAIR Public Repository for Experimentally Verified Proteoforms

  • Thomas, Paul Martin (PD/PI)

Project: Research project

Project Details


Proteoforms are key mediators of biological phenotypes. However, there is no systematic way to uniquely identify these chemical entities and no database to catalog proteoforms for future reference and use. To enable the proteoforms to be findable, accessible, interoperable and reusable (FAIR), experimentally verified proteoforms need to be uniquely identified and stored in an open framework for use by the scientific community. If a proteoform is easily recognized and linked to known biological metadata, then future researchers can link their discoveries with previous ones and formulate new hypotheses. This is important to the members of the Consortium for Top-Down Proteomics (CTDP), a non-profit organization established to promote top-down proteomics.
Here, we propose to create a scalable, two-tiered informatic framework for the organization and storage of experimentally verified proteoforms. The system will have a central database, which stores a minimal set of information regarding each proteoform, and a flexible framework for creating individual proteoform knowledgebases. Interest from the top-down proteomic community, software developers, and leading bioinformaticians to develop such a resource is high (see 15 Letters of Support). This includes a strong desire from UniProt to use experimentally verified proteoforms to bolster their leading protein knowledgebase. After the granting period is over, we believe that the central database should be community-owned and curated by the CTDP, and the knowledgebase framework should be open-source and maintained by the top-down community. Therefore, this proposal is split between both the development of deliverable software and the expansion of existing community-centered collaborations for software dissemination.
The Specific Aims focus on: 1) Establishing norms for communicating proteoforms. 2) Developing public proteoform databases and the domain-specific proteoform knowledgebase framework and 3). Engaging the scientific community to promote its use. The success of this project is measured through its dissemination.
Upon completion of this grant, we will have established and prepared a self-governing body to oversee the development and maintenance of bioinformatic software for the storage and dissemination of experimentally verified proteoforms. This body, managed by the CTDP, will have the initial tools to create public proteoform databases and have a sustainable governance system in compliance with FAIR principles.
Effective start/end date6/1/195/31/21


  • National Library of Medicine (5R21LM013097-02)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.