TY - JOUR
T1 - CheMixNet
T2 - Mixed DNN architectures for predicting chemical properties using multiple molecular representations
AU - Paul, Arindam
AU - Jha, Dipendra
AU - Al-Bahrani, Reda
AU - Liao, Wei keng
AU - Choudhary, Alok
AU - Agrawal, Ankit
N1 - Publisher Copyright:
Copyright © 2018, The Authors. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2018/11/14
Y1 - 2018/11/14
N2 - SMILES is a linear representation of chemical structures which encodes the connection table, and the stereochemistry of a molecule as a line of text with a grammar structure denoting atoms, bonds, rings and chains, and this information can be used to predict chemical properties. Molecular fingerprints are representations of chemical structures, successfully used in similarity search, clustering, classification, drug discovery, and virtual screening and are a standard and computationally efficient abstract representation where structural features are represented as a bit string. Both SMILES and molecular fingerprints are different representations for describing the structure of a molecule. There exist several predictive models for learning chemical properties based on either SMILES or molecular fingerprints. Here, our goal is to build predictive models that can leverage both these molecular representations. In this work, we present CheMixNet-a set of neural networks for predicting chemical properties from a mixture of features learned from the two molecular representations - SMILES as sequences and molecular fingerprints as vector inputs. We demonstrate the efficacy of CheMixNet architectures by evaluating on six different datasets. The proposed CheMixNet models not only outperforms the candidate neural architectures such as contemporary fully connected networks that uses molecular fingerprints and 1-D CNN and RNN models trained SMILES sequences, but also other state-of-the-art architectures such as Chemception and Molecular Graph Convolutions.
AB - SMILES is a linear representation of chemical structures which encodes the connection table, and the stereochemistry of a molecule as a line of text with a grammar structure denoting atoms, bonds, rings and chains, and this information can be used to predict chemical properties. Molecular fingerprints are representations of chemical structures, successfully used in similarity search, clustering, classification, drug discovery, and virtual screening and are a standard and computationally efficient abstract representation where structural features are represented as a bit string. Both SMILES and molecular fingerprints are different representations for describing the structure of a molecule. There exist several predictive models for learning chemical properties based on either SMILES or molecular fingerprints. Here, our goal is to build predictive models that can leverage both these molecular representations. In this work, we present CheMixNet-a set of neural networks for predicting chemical properties from a mixture of features learned from the two molecular representations - SMILES as sequences and molecular fingerprints as vector inputs. We demonstrate the efficacy of CheMixNet architectures by evaluating on six different datasets. The proposed CheMixNet models not only outperforms the candidate neural architectures such as contemporary fully connected networks that uses molecular fingerprints and 1-D CNN and RNN models trained SMILES sequences, but also other state-of-the-art architectures such as Chemception and Molecular Graph Convolutions.
UR - http://www.scopus.com/inward/record.url?scp=85095160094&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095160094&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85095160094
JO - Free Radical Biology and Medicine
JF - Free Radical Biology and Medicine
SN - 0891-5849
ER -