TY - JOUR
T1 - PolyDAT
T2 - A Generic Data Schema for Polymer Characterization
AU - Lin, Tzyy Shyang
AU - Rebello, Nathan J.
AU - Beech, Haley K.
AU - Wang, Zi
AU - El-Zaatari, Bassil
AU - Lundberg, David J.
AU - Johnson, Jeremiah A.
AU - Kalow, Julia A.
AU - Craig, Stephen L.
AU - Olsen, Bradley D.
N1 - Publisher Copyright:
©
PY - 2021/3/22
Y1 - 2021/3/22
N2 - Polymers are stochastic materials that represent distributions of different molecules. In general, to quantify the distribution, polymer researchers rely on a series of chemical characterizations that each reveal partial information on the distribution. However, in practice, the exact set of characterizations that are carried out, as well as how the characterization data are aggregated and reported, is largely nonstandard across the polymer community. This scenario makes polymer characterization data highly disparate, thereby significantly slowing down the development of polymer informatics. In this work, a proposal on how structural characterization data can be organized is presented. To ensure that the system can apply universally across the entire polymer community, the proposed schema, PolyDAT, is designed to embody a minimal congruent set of vocabulary that is common across different domains. Unlike most chemical schemas, where only data pertinent to the species of interest are included, PolyDAT deploys a multi-species reaction network construct, in which every characterization on relevant species is collected to provide the most comprehensive profile on the polymer species of interest. Instead of maintaining a comprehensive list of available characterization techniques, PolyDAT provides a handful of generic templates, which align closely with experimental conventions and cover most types of common characterization techniques. This allows flexibility for the development and inclusion of new measurement methods. By providing a standard format to digitalize data, PolyDAT serves not only as an extension to BigSMILES that provides the necessary quantitative information but also as a standard channel for researchers to share polymer characterization data.
AB - Polymers are stochastic materials that represent distributions of different molecules. In general, to quantify the distribution, polymer researchers rely on a series of chemical characterizations that each reveal partial information on the distribution. However, in practice, the exact set of characterizations that are carried out, as well as how the characterization data are aggregated and reported, is largely nonstandard across the polymer community. This scenario makes polymer characterization data highly disparate, thereby significantly slowing down the development of polymer informatics. In this work, a proposal on how structural characterization data can be organized is presented. To ensure that the system can apply universally across the entire polymer community, the proposed schema, PolyDAT, is designed to embody a minimal congruent set of vocabulary that is common across different domains. Unlike most chemical schemas, where only data pertinent to the species of interest are included, PolyDAT deploys a multi-species reaction network construct, in which every characterization on relevant species is collected to provide the most comprehensive profile on the polymer species of interest. Instead of maintaining a comprehensive list of available characterization techniques, PolyDAT provides a handful of generic templates, which align closely with experimental conventions and cover most types of common characterization techniques. This allows flexibility for the development and inclusion of new measurement methods. By providing a standard format to digitalize data, PolyDAT serves not only as an extension to BigSMILES that provides the necessary quantitative information but also as a standard channel for researchers to share polymer characterization data.
UR - http://www.scopus.com/inward/record.url?scp=85101806020&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101806020&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.1c00028
DO - 10.1021/acs.jcim.1c00028
M3 - Article
C2 - 33615783
AN - SCOPUS:85101806020
SN - 1549-9596
VL - 61
SP - 1150
EP - 1163
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 3
ER -