Universal machine learning framework for defect predictions in zinc blende semiconductors

Arun Mannodi-Kanakkithodi*, Xiaofeng Xiang, Laura Jacoby, Robert Biegaj, Scott T. Dunham, Daniel R. Gamelin, Maria K.Y. Chan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


We develop a framework powered by machine learning (ML) and high-throughput density functional theory (DFT) computations for the prediction and screening of functional impurities in groups IV, III–V, and II–VI zinc blende semiconductors. Elements spanning the length and breadth of the periodic table are considered as impurity atoms at the cation, anion, or interstitial sites in supercells of 34 candidate semiconductors, leading to a chemical space of approximately 12,000 points, 10% of which are used to generate a DFT dataset of charge dependent defect formation energies. Descriptors based on tabulated elemental properties, defect coordination environment, and relevant semiconductor properties are used to train ML regression models for the DFT computed neutral state formation energies and charge transition levels of impurities. Optimized kernel ridge, Gaussian process, random forest, and neural network regression models are applied to screen impurities with lower formation energy than dominant native defects in all compounds.

Original languageEnglish (US)
Article number100450
Issue number3
StatePublished - Mar 11 2022
Externally publishedYes


  • combinatorial screening
  • computational materials science
  • density functional theory
  • DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems
  • high-throughput data
  • machine learning
  • materials informatics
  • mid-gap states
  • point defects
  • semiconductors

ASJC Scopus subject areas

  • Decision Sciences(all)


Dive into the research topics of 'Universal machine learning framework for defect predictions in zinc blende semiconductors'. Together they form a unique fingerprint.

Cite this