A logistic regression model based on the national mammography database format to aid breast cancer diagnosis

Jagpreet Chhatwal, Oguzhan Alagoz, Mary J. Lindstrom, Charles E. Kahn, Katherine A. Shaffer, Elizabeth S. Burnside

Research output: Contribution to journalArticlepeer-review

59 Scopus citations


OBJECTIVE. The purpose of our study was to create a breast cancer risk estimation model based on the descriptors of the National Mammography Database using logistic regression that can aid in decision making for the early detection of breast cancer. MATERIALS AND METHODS. We created two logistic regression models based on the mammography features and demographic data for 62,219 consecutive mammography records from 48,744 studies in 18,270 patients reported using the Breast Imaging Reporting and Data System(BI-RADS) lexicon and the National Mammography Database format between April 5, 1999 and February 9, 2004. State cancer registry outcomes matched with our data served as the reference standard. The probability of cancer was the outcome in both models. Model 2 was built using all variables in Model 1 plus radiologists' BI-RADS assessment categories. We used 10-fold cross-validation to train and test the model and to calculate the area under the receiver operating characteristic curves(A z) to measure the performance. Both models were compared with the radiologists' BI-RADS assessments. RESULTS. Radiologists achieved an A z value of 0.939 ± 0.011. The A z was 0.927 ± 0.015 for Model 1 and 0.963 ± 0.009 for Model 2. At 90% specificity, the sensitivity of Model 2(90%) was significantly better(p < 0.001) than that of radiologists(82%) and Model 1(83%). At 85% sensitivity, the specificity of Model 2(96%) was significantly better(p < 0.001) than that of radiologists(88%) and Model 1(87%). CONCLUSION. Our logistic regression model can effectively discriminate between benign and malignant breast disease and can identify the most important features associated with breast cancer.

Original languageEnglish (US)
Pages (from-to)1117-1127
Number of pages11
JournalAmerican Journal of Roentgenology
Issue number4
StatePublished - Apr 2009
Externally publishedYes


  • Logistic regression
  • Mammography
  • National mammography database
  • Risk prediction

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging


Dive into the research topics of 'A logistic regression model based on the national mammography database format to aid breast cancer diagnosis'. Together they form a unique fingerprint.

Cite this