The importance of recognizing and reporting sequence database contamination for proteomics

Olivier Pible, Erica M. Hartmann, Gilles Imbert, Jean Armengaud*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


Advances in genome sequencing have made proteomic experiments more successful than ever. However, not all entries in a sequence database are of equal quality. Genome sequences are contaminated more frequently than is admitted. Contamination impacts homology-based proteomic, proteogenomic, and metaproteomic results. We highlight two examples in the National Center for Biotechnology Information non-redundant database (NCBInr) that are likely contaminated: the bacterium Enterococcus gallinarum EGD-AAK12 and the insect Ceratitis capitata. We hope to incite users of this and other databases to critically evaluate submitted sequences and to contribute to the overall quality of the database by signaling potential errors when possible.

Original languageEnglish (US)
Pages (from-to)246-249
Number of pages4
JournalEuPA Open Proteomics
StatePublished - Jun 2014


  • Blast analysis
  • Contamination
  • Curation
  • Database
  • Metaproteomics
  • Proteomics

ASJC Scopus subject areas

  • Biochemistry


Dive into the research topics of 'The importance of recognizing and reporting sequence database contamination for proteomics'. Together they form a unique fingerprint.

Cite this