TY - JOUR
T1 - Developing a standardized but extendable framework to increase the findability of infectious disease datasets
AU - the NIAID Systems Biology Data Dissemination Working Group
AU - Tsueng, Ginger
AU - Cano, Marco A.Alvarado
AU - Bento, José
AU - Czech, Candice
AU - Kang, Mengjia
AU - Pache, Lars
AU - Rasmussen, Luke V.
AU - Savidge, Tor C.
AU - Starren, Justin
AU - Wu, Qinglong
AU - Xin, Jiwen
AU - Yeaman, Michael R.
AU - Zhou, Xinghua
AU - Su, Andrew I.
AU - Wu, Chunlei
AU - Brown, Liliana
AU - Shabman, Reed S.
AU - Hughes, Laura D.
AU - Turkarslan, Serdar
N1 - Funding Information:
This work was supported in part by the National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH) grants U01 AI124290 (Baylor: TS, QW), U01 AI124302 (Boston College: JB), U19 AI135995 (Scripps Research: MAAC, LDH, GT, AIS, CW, JX, XZ), U19 AI135964 (Northwestern: MK, LVR, JS), U19 AI135972 (Sanford Burnham Prebys: LP); U01 AI124319 (UCLA: MY), 75N91019D00024 (Scripps Research: CZ, GT, LDH, AIS, CW); National Center for Advancing Translational Sciences NIH grant U24 TR002306 (Scripps Research: MAAC, LDH, GT, AIS, CW, JX, XZ); and National Institute of General Medical Sciences grant R01 GM083924 (Scripps Research: MAAC, GT, AIS, CW, JX, XZ). We acknowledge the NIAID/DMID Systems Biology Consortium for Infectious Diseases Data Dissemination Working Group for developing the NIAID SysBio schemas, registering center-created datasets and computational tools, and providing critical feedback on the manuscript. We thank Reed Shabman for his leadership within the Data Dissemination Working Group, coordinating with centers to register datasets and tools, and helpful comments and careful revisions of the paper. We additionally thank Liliana Brown for the support of the Program this paper originated from and Serdar Turkarslan, Ishwar Chandramouliswaran, Wilbert van Panhuis, and Jack DiGiovanna for helpful discussions in preparing this manuscript.
Funding Information:
This work was supported in part by the National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH) grants U01 AI124290 (Baylor: TS, QW), U01 AI124302 (Boston College: JB), U19 AI135995 (Scripps Research: MAAC, LDH, GT, AIS, CW, JX, XZ), U19 AI135964 (Northwestern: MK, LVR, JS), U19 AI135972 (Sanford Burnham Prebys: LP); U01 AI124319 (UCLA: MY), 75N91019D00024 (Scripps Research: CZ, GT, LDH, AIS, CW); National Center for Advancing Translational Sciences NIH grant U24 TR002306 (Scripps Research: MAAC, LDH, GT, AIS, CW, JX, XZ); and National Institute of General Medical Sciences grant R01 GM083924 (Scripps Research: MAAC, GT, AIS, CW, JX, XZ). We acknowledge the NIAID/DMID Systems Biology Consortium for Infectious Diseases Data Dissemination Working Group for developing the NIAID SysBio schemas, registering center-created datasets and computational tools, and providing critical feedback on the manuscript. We thank Reed Shabman for his leadership within the Data Dissemination Working Group, coordinating with centers to register datasets and tools, and helpful comments and careful revisions of the paper. We additionally thank Liliana Brown for the support of the Program this paper originated from and Serdar Turkarslan, Ishwar Chandramouliswaran, Wilbert van Panhuis, and Jack DiGiovanna for helpful discussions in preparing this manuscript.
Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.
AB - Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.
UR - http://www.scopus.com/inward/record.url?scp=85148843487&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85148843487&partnerID=8YFLogxK
U2 - 10.1038/s41597-023-01968-9
DO - 10.1038/s41597-023-01968-9
M3 - Article
C2 - 36823157
AN - SCOPUS:85148843487
SN - 2052-4463
VL - 10
JO - Scientific data
JF - Scientific data
IS - 1
M1 - 99
ER -