Transforming Big Data into AI-ready data for nutrition and obesity research

Diana M. Thomas*, Rob Knight, Jack A. Gilbert, Marilyn C. Cornelis, Marie G. Gantz, Kate Burdekin, Kevin Cummiskey, Susan C.J. Sumner, Wimal Pathmasiri, Edward Sazonov, Kelley Pettee Gabriel, Erin E. Dooley, Mark A. Green, Andrew Pfluger, Samantha Kleinberg

*Corresponding author for this work

Research output: Contribution to journalReview articlepeer-review

3 Scopus citations

Abstract

Objective: Big Data are increasingly used in obesity and nutrition research to gain new insights and derive personalized guidance; however, this data in raw form are often not usable. Substantial preprocessing, which requires machine learning (ML), human judgment, and specialized software, is required to transform Big Data into artificial intelligence (AI)- and ML-ready data. These preprocessing steps are the most complex part of the entire modeling pipeline. Understanding the complexity of these steps by the end user is critical for reducing misunderstanding, faulty interpretation, and erroneous downstream conclusions. Methods: We reviewed three popular obesity/nutrition Big Data sources: microbiome, metabolomics, and accelerometry. The preprocessing pipelines, specialized software, challenges, and how decisions impact final AI- and ML-ready products were detailed. Results: Opportunities for advances to improve quality control, speed of preprocessing, and intelligent end user consumption were presented. Conclusions: Big Data have the exciting potential for identifying new modifiable factors that impact obesity research. However, to ensure accurate interpretation of conclusions arising from Big Data, the choices involved in preparing AI- and ML-ready data need to be transparent to investigators and clinicians relying on the conclusions.

Original languageEnglish (US)
Pages (from-to)857-870
Number of pages14
JournalObesity
Volume32
Issue number5
DOIs
StatePublished - May 2024

Funding

Diana M. Thomas and Kevin Cummiskey were supported by the National Institutes of Health (NIH) Interagency Agreement AOD22022001. Samantha Kleinberg was supported in part by the NIH under U54TR004279. Marie G. Gantz was supported by U24HD107676. Edward Sazonov was supported by U24CA268228, and Rob Knight and Jack A. Gilbert were supported by 1U24DK131617\u201001. Kelley Pettee Gabriel and Erin E. Dooley were supported by UG1HD107688. Andrew Pfluger was supported by the US Department of Defense Environmental Security Technology Certification Program (ESTCP) grant EW22\u20107278, and Susan C. J. Sumner was supported by U24CA268153.

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Endocrinology, Diabetes and Metabolism
  • Endocrinology
  • Nutrition and Dietetics

Fingerprint

Dive into the research topics of 'Transforming Big Data into AI-ready data for nutrition and obesity research'. Together they form a unique fingerprint.

Cite this