OCT5k: A dataset of multi-disease and multi-graded annotations for retinal layers

Mustafa Arikan, James Willoughby, Sevim Ongun, Ferenc Sallo, Andrea Montesel, Hend Ahmed, Ahmed Hagag, Marius Book, Henrik Faatz, Maria Vittoria Cicinelli, Amani A. Fawzi, Dominika Podkowinski, Marketa Cilkova, Diana Morais De Almeida, Moussa Zouache, Ganesham Ramsamy, Watjana Lilaonitkul, Adam M. Dubis*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Publicly available open-access OCT datasets for retinal layer segmentation have been limited in scope, often being small in size, specific to a single disease, or containing only one grading. This dataset improves upon this with multi-grader and multi-disease labels for training machine learning-based algorithms. The proposed dataset covers three subsets of scans (Age-related Macular Degeneration, Diabetic Macular Edema, and healthy) and annotations for two types of tasks (semantic segmentation and object detection). This dataset compiled 5016 pixel-wise manual labels for 1672 OCT scans featuring 5 layer boundaries for three different disease classes to support development of automatic techniques. A subset of data (566 scans across 9 classes of disease biomarkers) was subsequently labeled for disease features for 4698 bounding box annotations. To minimize bias, images were shuffled and distributed among graders. Retinal layers were corrected, and outliers identified using the interquartile range (IQR). This step was iterated three times, improving layer annotations’ quality iteratively, ensuring a reliable dataset for automated retinal image analysis.

Original languageEnglish (US)
Article number267
JournalScientific Data
Volume12
Issue number1
DOIs
StatePublished - Dec 2025

Funding

The authors acknowledge the use of the Joint Academic Data science Endeavour (JADE) Tier 2 computing facility funded by the Engineering and Physical Sciences Research Council (EPSRC), UK. This research was supported by the National Institute for Health and Care Research (NIHR) Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust, UCL Institute of Ophthalmology and UCL Institute of Healthcare Engineering. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This work was supported by National Institutes of Health Care Grant (EY014800), and an Unrestricted Grant from Research to Prevent Blindness, New York, NY, to the Department of Ophthalmology & Visual Sciences, University of Utah.

ASJC Scopus subject areas

  • Statistics and Probability
  • Information Systems
  • Education
  • Computer Science Applications
  • Statistics, Probability and Uncertainty
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'OCT5k: A dataset of multi-disease and multi-graded annotations for retinal layers'. Together they form a unique fingerprint.

Cite this