Rationale and Objectives: The goal of this study was to determine the accuracy and precision of using scores from a receiver operating characteristic rating scale to estimate sensitivity and specificity. Materials and Methods: We used data collected in a previous study that measured the improvements in radiologists' ability to classify mammographic microcalcification clusters as benign or malignant with and without the use of a computer-aided diagnosis scheme. Sensitivity and specificity were estimated from the rating data from a question that directly asked the radiologists their biopsy recommendations, which was used as the "truth," because it is the actual recall decision, thus it is their subjective truth. By thresholding the rating data, sensitivity and specificity were estimated for different threshold values. Results: Because of interreader and intrareader variability, estimated sensitivity and specificity values for individual readers could be as much as 100% in error when using rating data compared to using the biopsy recommendation data. When pooled together, the estimates using thresholding the rating data were in good agreement with sensitivity and specificity estimated from the recommendation data. However, the statistical power of the rating data estimates was lower. Conclusions: By simply asking the observer his or her explicit recommendation (eg, biopsy or no biopsy), sensitivity and specificity can be measured directly, giving a more accurate description of empirical variability and the power of the study can be maximized.
- Computer-aided diagnosis
- Observer studies
- Receiver operating characteristic analysis
- Technology assessment
ASJC Scopus subject areas
- Radiology Nuclear Medicine and imaging