Crystal Site Feature Embedding Enables Exploration of Large Chemical Spaces

Hitarth Choubisa, Mikhail Askerka, Kevin Ryczko, Oleksandr Voznyy, Kyle Mills, Isaac Tamblyn*, Edward H. Sargent

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

31 Scopus citations


Mapping materials science problems onto computational frameworks suitable for machine learning can accelerate materials discovery. Combining proposed crystal site feature embedding (CSFE) representation with convolutional and extensive deep neural networks, we achieve a low mean absolute test error of 3.7 meV/atom and 0.069 eV on density functional theory energies and band gaps of mixed halide perovskites. We explore how a small amount of cadmium doping can potentially be applied in solar cell design and sample the large chemical space by using a variational autoencoder to discover interesting perovskites with band gaps in the ultraviolet and infrared. Additionally, we use CSFE to explore chemical spaces and small doping concentrations beyond those used for training. We further show that CSFE has a mean absolute test error of 7 meV/atom and 0.13 eV for total energies and band gaps for 2D perovskites and discuss its adaptability for exploration of an even wider variety of chemical systems. Density functional theory (DFT) is of interest in modern-day materials discovery. However, DFT is computationally expensive. Here, we develop a new crystal site feature embedding (CSFE) representation that achieves low error in predicting DFT properties and enables predicting properties of chemical families and doping fractions beyond those present in the training datasets. Using CSFE with autoencoders, we present a scheme that enables sampling of large chemical spaces and offers insight into key semiconductor parameters such as band gap. We demonstrate that CSFE works on both 2D and 3D perovskites and identify promising ultraviolet and infrared candidate materials. Here, we report crystal site feature embedding (CSFE), a representation for machine learning of materials that achieves low mean absolute errors for density functional theory band gaps and formation energies. Using CSFE with CNNs and EDNNs, we explored chemical families and doping fractions beyond those present in the training dataset. CSFE allowed us to sample large chemical spaces for materials of interest using autoencoders. We demonstrate the application of the representation by finding perovskite compositions for the ultraviolet and infrared.

Original languageEnglish (US)
Pages (from-to)433-448
Number of pages16
Issue number2
StatePublished - Aug 5 2020


  • MAP3: Understanding
  • auto-encoders
  • convolutional neural networks
  • density functional theory
  • extensive deep neural networks
  • halide perovskites
  • machine learning
  • materials discovery
  • optoelectronic materials
  • photovoltaics

ASJC Scopus subject areas

  • General Materials Science


Dive into the research topics of 'Crystal Site Feature Embedding Enables Exploration of Large Chemical Spaces'. Together they form a unique fingerprint.

Cite this