ET-AL: Entropy-targeted active learning for bias mitigation in materials data

Hengrui Zhang, Wei (Wayne) Chen, James M. Rondinelli*, Wei Chen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Growing materials data and data-driven informatics drastically promote the discovery and design of materials. While there are significant advancements in data-driven models, the quality of data resources is less studied despite its huge impact on model performance. In this work, we focus on data bias arising from uneven coverage of materials families in existing knowledge. Observing different diversities among crystal systems in common materials databases, we propose an information entropy-based metric for measuring this bias. To mitigate the bias, we develop an entropy-targeted active learning (ET-AL) framework, which guides the acquisition of new data to improve the diversity of underrepresented crystal systems. We demonstrate the capability of ET-AL for bias mitigation and the resulting improvement in downstream machine learning models. This approach is broadly applicable to data-driven materials discovery, including autonomous data acquisition and dataset trimming to reduce bias, as well as data-driven informatics in other scientific domains.

Original languageEnglish (US)
Article number021403
JournalApplied Physics Reviews
Volume10
Issue number2
DOIs
StatePublished - Jun 1 2023

Funding

This work was supported by the Advanced Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy, under Award No. DE-AR0001209, and the Center for Hierarchical Materials Design, under Award No. ChiMaD NIST 70NANB19H005. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. We also acknowledge Francesca M. Tavazza and Brian L. DeCost for assistance with data collection. H.Z. acknowledges Kyle D. Miller, Dale Gaines II, Adetoye H. Adekoya, Whitney Tso, and G. Jeffrey Snyder for insightful discussions, and Ke Sun for valuable advice on visualization.

ASJC Scopus subject areas

  • General Physics and Astronomy

Fingerprint

Dive into the research topics of 'ET-AL: Entropy-targeted active learning for bias mitigation in materials data'. Together they form a unique fingerprint.

Cite this