TY - JOUR
T1 - The international workshop on osteoarthritis imaging knee mri segmentation challenge
T2 - A multi-institute evaluation and analysis framework on a standardized dataset
AU - IWOAI Segmentation Challenge Writing Group
AU - Desai, Arjun D.
AU - Caliva, Francesco
AU - Iriondo, Claudia
AU - Mortazi, Aliasghar
AU - Jambawalikar, Sachin
AU - Bagci, Ulas
AU - Perslev, Mathias
AU - Igel, Christian
AU - Dam, Erik B.
AU - Gaj, Sibaji
AU - Yang, Mingrui
AU - Li, Xiaojuan
AU - Deniz, Cem M.
AU - Juras, Vladimir
AU - Regatte, Ravinder
AU - Gold, Garry E.
AU - Hargreaves, Brian A.
AU - Pedoia, Valentina
AU - Chaudhari, Akshay S.
N1 - Funding Information:
Supported by National Institutes of Health grants (R01 AR063643, R01 EB002524, K24 AR062068, and P41 EB015891, R00AR070902, R61AR073552, R01 AR074453); a National Science Foundation grant (DGE 1656518); grants from GE Healthcare and Philips (research support); and a Stanford University Department of Radiology Precision Health and Integrated Diagnostics Seed Grant. M.P. supported by a grant from the Independent Research Fund Denmark, project U-Sleep, project number 9131-00099B. C. Igel supported by a grant from the Danish Council for Independent Research for the project U-Sleep. Image data were acquired from the Osteoarthritis Initiative (OAI). The OAI is a public-private partnership composed of five contracts (N01-AR-2-2258, N01-AR-2-2259, N01-AR-2-2260, N01-AR-2-2261, N01-AR-2-2262) funded by the National Institutes of Health, a branch of the Department of Health and Human Services, and conducted by the OAI Study Investigators. Private funding partners include Merck Research Laboratories, Novartis Pharmaceuticals, GlaxoSmithKline, and Pfizer. Private sector funding for the OAI is managed by the Foundation for the National Institutes of Health.
Funding Information:
Disclosures of Conflicts of Interest: A.D.D. Activities related to the present article: grants and travel support from the National Science Foundation, the National Institute of Arthritis and Musculoskeletal and Skin Diseases, the National Institute of Biomedical Imaging and Bioengineering, GE Healthcare, and Philips. Activities not related to the present article: grants from the National Institutes of Health. Other relationships: disclosed no relevant relationships. F.C. disclosed no relevant relationships. C. Iriondo disclosed no relevant relationships. A.M. disclosed no relevant relationships. S.J. disclosed no relevant relationships. U.B. disclosed no relevant relationships. M.P. Activities related to the present article: grant from the Independent Research Fund Denmark. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. C. Igel Activities related to the present article: grant from the Danish Council for Independent Research. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. E.B.D. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: stockholder in Biomediq and Cerebriu. Other relationships: disclosed no relevant relationships. S.G. disclosed no relevant relationships. M.Y. disclosed no relevant relationships. X.L. disclosed no relevant relationships. C.M.D. Activities related to the present article: grant from the National Institute of Arthritis and Musculoskeletal and Skin Diseases. Activities not related to the present article: disclosed no relevant relationships. Other relationships: disclosed no relevant relationships. V.J. disclosed no relevant relationships. R.R. disclosed no relevant relationships. G.E.G. Activities related to the present article: grants from the National Institutes of Health. Activities not related to the present article: board member for HeartVista; consultant for Canon; grants from GE Healthcare. Other relationships: disclosed no relevant relationships. B.A.H. Activities related to the present article: grant from the National Institutes of Health. Activities not related to the present article: royalties from patents licensed by Siemens and GE Healthcare; stockholder in LVIS. Other relationships: disclosed no relevant relationships. V.P. disclosed no relevant relationships. A.S.C. Activities related to the present article: grants from the National Institutes of Health, GE Healthcare, and Philips. Activities not related to the present article: board member for BrainKey and Chondrometrics; consultant for Skope, Subtle Medical, Chondrometrics, Image Analysis Group, Edge Analytics, ICM, and Culvert Engineering; stockholder in Subtle Medical, LVIS, and BrainKey; travel support from Paracelsus Medical Private University. Other relationships: disclosed no relevant relationships.
Publisher Copyright:
© RSNA, 2021.
PY - 2021/3
Y1 - 2021/3
N2 - Purpose: To organize a multi-institute knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression. Materials and Methods: A dataset partition consisting of three-dimensional knee MRI from 88 retrospective patients at two time points (baseline and 1-year follow-up) with ground truth articular (femoral, tibial, and patellar) cartilage and meniscus segmentations was standardized. Challenge submissions and a majority-vote ensemble were evaluated against ground truth segmentations using Dice score, average symmetric surface distance, volumetric overlap error, and coefficient of variation on a holdout test set. Similarities in automated segmentations were measured using pairwise Dice coefficient correlations. Articular cartilage thickness was computed longitudinally and with scans. Correlation between thickness error and segmentation metrics was measured using the Pearson correlation coefficient. Two empirical upper bounds for ensemble performance were computed using combinations of model outputs that consolidated true positives and true negatives. Results: Six teams (T1 –T6) submitted entries for the challenge. No differences were observed across any segmentation metrics for any tissues (P = .99) among the four top-performing networks (T2, T3, T4, T6). Dice coefficient correlations between network pairs were high (> 0.85). Per-scan thickness errors were negligible among networks T1 –T4 (P = .99), and longitudinal changes showed minimal bias (< 0.03 mm). Low correlations (r < 0.41) were observed between segmentation metrics and thickness error. The majority-vote ensemble was comparable to top-performing networks (P = .99). Empirical upper-bound performances were similar for both combinations (P = .99). Conclusion: Diverse networks learned to segment the knee similarly, where high segmentation accuracy did not correlate with cartilage thickness accuracy and voting ensembles did not exceed individual network performance.
AB - Purpose: To organize a multi-institute knee MRI segmentation challenge for characterizing the semantic and clinical efficacy of automatic segmentation methods relevant for monitoring osteoarthritis progression. Materials and Methods: A dataset partition consisting of three-dimensional knee MRI from 88 retrospective patients at two time points (baseline and 1-year follow-up) with ground truth articular (femoral, tibial, and patellar) cartilage and meniscus segmentations was standardized. Challenge submissions and a majority-vote ensemble were evaluated against ground truth segmentations using Dice score, average symmetric surface distance, volumetric overlap error, and coefficient of variation on a holdout test set. Similarities in automated segmentations were measured using pairwise Dice coefficient correlations. Articular cartilage thickness was computed longitudinally and with scans. Correlation between thickness error and segmentation metrics was measured using the Pearson correlation coefficient. Two empirical upper bounds for ensemble performance were computed using combinations of model outputs that consolidated true positives and true negatives. Results: Six teams (T1 –T6) submitted entries for the challenge. No differences were observed across any segmentation metrics for any tissues (P = .99) among the four top-performing networks (T2, T3, T4, T6). Dice coefficient correlations between network pairs were high (> 0.85). Per-scan thickness errors were negligible among networks T1 –T4 (P = .99), and longitudinal changes showed minimal bias (< 0.03 mm). Low correlations (r < 0.41) were observed between segmentation metrics and thickness error. The majority-vote ensemble was comparable to top-performing networks (P = .99). Empirical upper-bound performances were similar for both combinations (P = .99). Conclusion: Diverse networks learned to segment the knee similarly, where high segmentation accuracy did not correlate with cartilage thickness accuracy and voting ensembles did not exceed individual network performance.
KW - Cartilage
KW - Knee
KW - MR-Imaging
KW - Segmentation
UR - http://www.scopus.com/inward/record.url?scp=85107799764&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107799764&partnerID=8YFLogxK
U2 - 10.1148/ryai.2021200078
DO - 10.1148/ryai.2021200078
M3 - Article
C2 - 34235438
AN - SCOPUS:85107799764
VL - 3
JO - Radiology: Artificial Intelligence
JF - Radiology: Artificial Intelligence
SN - 2638-6100
IS - 3
M1 - e200078
ER -