Maintaining the integrity of human immunodeficiency virus sequence databases

Gerald H. Learn*, Bette T M Korber, Brian Foley, Beatrice H. Hahn, Steven M. Wolinsky, James I. Mullins

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


Human immunodeficiency virus type 1 (HIV-1) sequences are accumulating in the literature at a rapid pace. For this ever-expanding resource to be maximally useful, it is critical that researchers strive to maintain a high level of quality assurance, both in experimental design and conduct and in analyses. Here we present detailed analyses of problematic sets of HIV-1 sequences in the database that include sequence anomalies suggestive of mislabeling or sample contamination problems. These data are examined in the context of currently available HIV-1 sequence information to provide an example of how to identify potentially flawed data. Indicators of potential problems with sequences are (i) sequences that are nearly identical that are supposed to be derived from unlinked individuals and that are markedly distinct from other sequences from the putative source or (ii) sequences that are nearly identical to those of laboratory strains. We provide an outline of methods that researchers can use to perform preliminary laboratory and computational analyses that could help identify problematic data and thus help ensure the integrity of sequence databases.

Original languageEnglish (US)
Pages (from-to)5720-5730
Number of pages11
JournalJournal of virology
Issue number8
StatePublished - Aug 1996

ASJC Scopus subject areas

  • Insect Science
  • Virology
  • Microbiology
  • Immunology


Dive into the research topics of 'Maintaining the integrity of human immunodeficiency virus sequence databases'. Together they form a unique fingerprint.

Cite this