Identification of human gene research articles with wrongly identified nucleotide sequences

Yasunori Park, Rachael A. West, Pranujan Pathmendra, Bertrand Favier, Thomas Stoeger, Amanda Capes-Davis, Guillaume Cabanac, Cyril Labbé, Jennifer A. Byrne*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Nucleotide sequence reagents underpin molecular techniques that have been applied across hundreds of thousands of publications. We have previously reported wrongly identified nucleotide sequence reagents in human research publications and described a semi-automated screening tool Seek & Blastn to fact-check their claimed status. We applied Seek & Blastn to screen >11,700 publications across five literature corpora, including all original publications in Gene from 2007 to 2018 and all original open-access publications in Oncology Reports from 2014 to 2018. After manually checking Seek & Blastn outputs for >3,400 human research articles, we identified 712 articles across 78 journals that described at least one wrongly identified nucleotide sequence. Verifying the claimed identities of >13,700 sequences highlighted 1,535 wrongly identified sequences, most of which were claimed targeting reagents for the analysis of 365 human protein-coding genes and 120 non-coding RNAs. The 712 problematic articles have received >17,000 citations, including citations by human clinical trials. Given our estimate that approximately one-quarter of problematic articles may misinform the future development of human therapies, urgent measures are required to address unreliable gene research articles.

Original languageEnglish (US)
Article numbere202101203
JournalLife science alliance
Volume5
Issue number4
DOIs
StatePublished - Apr 2022

Funding

JA Byrne and C Labb\u00E9 gratefully acknowledge funding from the US Office of Research Integrity, grant ID ORIIR180038-01-00. JA Byrne, C Labb\u00E9, and A Capes-Davis gratefully acknowledge grant funding from the National Health and Medical Research Council of Australia, Ideas grant ID APP1184263. T Stoeger gratefully acknowledges funding from the National Science Foundation, 1956338, SCISIPBIO: A data-science approach to evaluating the likelihood of fraud and error in published studies; K99AG068544, National Institutes on Aging, Integrative Multi-Scale Systems Analysis of Gene-Expression-Driven Aging Morbidity; National Institute of Allergy and Infectious Diseases, AI135964, Successful Clinical Response In Pneumonia Therapy Systems Biology Center. The authors thank journal editorial staff for discussions and support of this study, and two anonymous peer reviewers for their insightful comments.

ASJC Scopus subject areas

  • Ecology
  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Plant Science
  • Health, Toxicology and Mutagenesis

Fingerprint

Dive into the research topics of 'Identification of human gene research articles with wrongly identified nucleotide sequences'. Together they form a unique fingerprint.

Cite this