MultiCon: A Semi-Supervised Approach for Predicting Drug Function from Chemical Structure Analysis

Pracheta Sahoo, Indranil Roy*, Zhuoyi Wang, Feng Mi, Lin Yu, Pradeep Balasubramani, Latifur Khan*, J. Fraser Stoddart*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

Semi-supervised learning has proved its efficacy in utilizing extensive unlabeled data to alleviate the use of a large amount of supervised data and improve model performance. Despite its tremendous potential, semi-supervised learning has yet to be implemented in the field of drug discovery. Empirical testing of drugs and their classification is costly and time-consuming. In contrast, predicting therapeutic applications of drugs from their structural formulas using semi-supervised learning would reduce costs and time significantly. Herein, we employ a new multicontrastive-based semi-supervised learning algorithm - MultiCon - for classifying drugs into 12 categories, according to therapeutic applications, on the basis of image analyses of their structural formulas. By rational use of data balancing, online augmentations of the drug image data during training, and the combined use of multicontrastive loss with consistency regularization, MultiCon achieves better class prediction accuracies when compared with the state-of-the-art machine learning methods across a variety of existing semi-supervised learning benchmarks. In particular, it performs exceptionally well with a limited number of labeled examples. For instance, with just 5000 labeled drugs in a PubChem (D3) data set, MultiCon achieved a class prediction accuracy of 97.74%.

Original languageEnglish (US)
Pages (from-to)5995-6006
Number of pages12
JournalJournal of Chemical Information and Modeling
Volume60
Issue number12
DOIs
StatePublished - Dec 28 2020

Funding

The authors thank Northwestern University and The University of Texas at Dallas for their continued support for this research. The research reported herein was supported in part by NSF awards (DMS-1737978, DGE-2039542, OAC-1828467, OAC-1931541, DGE-1906630) and an IBM faculty award (research).

ASJC Scopus subject areas

  • General Chemical Engineering
  • General Chemistry
  • Library and Information Sciences
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'MultiCon: A Semi-Supervised Approach for Predicting Drug Function from Chemical Structure Analysis'. Together they form a unique fingerprint.

Cite this