Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world

Yawei Li, Qingyun Liu, Zexian Zeng, Yuan Luo*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the virus. Due to the limited prior information on the newly emerged coronavirus, we utilized four different clustering algorithms to group 16, S73 SARS-CoV-2 strains, which automatically enables the identification of spatial structure for SARS-CoV-2. A total of six distinct genomic clusters were identified using mutation profiles as input features. Comparison of the clustering results reveals that the four algorithms produced highly consistent results, but the state-of-the-art unsupervised deep learning clustering algorithm performed best and produced the smallest intra-cluster pairwise genetic distances. The varied proportions of the six clusters within different continents revealed specific geographical distributions. In particular, our analysis found that Oceania was the only continent on which the strains were dispersively distributed into six clusters. In summary, this study provides a concrete framework for the use of clustering methods to study the global population structure of SARS-CoV-2. In addition, clustering methods can be used for future studies of variant population structures in specific regions of these fast-growing viruses.

Original languageEnglish (US)
Title of host publicationProceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
EditorsYufei Huang, Lukasz Kurgan, Feng Luo, Xiaohua Tony Hu, Yidong Chen, Edward Dougherty, Andrzej Kloczkowski, Yaohang Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages58-63
Number of pages6
ISBN (Electronic)9781665401265
DOIs
StatePublished - 2021
Event2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021 - Virtual, Online, United States
Duration: Dec 9 2021Dec 12 2021

Publication series

NameProceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021

Conference

Conference2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
Country/TerritoryUnited States
CityVirtual, Online
Period12/9/2112/12/21

Keywords

  • Deep learning clustering
  • SARS-CoV-2
  • SNP
  • evolution
  • population structure

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Biomedical Engineering
  • Health Informatics
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world'. Together they form a unique fingerprint.

Cite this