Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores

Seth D. Goldstein*, Brenessa Lindeman, Jorie Colbert-Getz, Trisha Arbella, Robert Dudas, Anne Lidor, Bethany Sacks

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

30 Scopus citations


Background The clinical knowledge of medical students on a surgery clerkship is routinely assessed via subjective evaluations from faculty members and residents. Interpretation of these ratings should ideally be valid and reliable. However, prior literature has questioned the correlation between subjective and objective components when assessing students' clinical knowledge. Methods Retrospective cross-sectional data were collected from medical student records at The Johns Hopkins University School of Medicine from July 2009 through June 2011. Surgical faculty members and residents rated students' clinical knowledge on a 5-point, Likert-type scale. Interrater reliability was assessed using intraclass correlation coefficients for students with ≥4 attending surgeon evaluations (n = 216) and ≥4 resident evaluations (n = 207). Convergent validity was assessed by correlating average evaluation ratings with scores on the National Board of Medical Examiners (NBME) clinical subject examination for surgery. Average resident and attending surgeon ratings were also compared by NBME quartile using analysis of variance. Results There were high degrees of reliability for resident ratings (intraclass correlation coefficient,.81) and attending surgeon ratings (intraclass correlation coefficient,.76). Resident and attending surgeon ratings shared a moderate degree of variance (19%). However, average resident ratings and average attending surgeon ratings shared a small degree of variance with NBME surgery examination scores (ρ2 ≤.09). When ratings were compared among NBME quartile groups, the only significant difference was for residents' ratings of students with the lower 25th percentile of scores compared with the top 25th percentile of scores (P =.007). Conclusions Although high interrater reliability suggests that attending surgeons and residents rate students with consistency, the lack of convergent validity suggests that these ratings may not be reflective of actual clinical knowledge. Both faculty members and residents may benefit from training in knowledge assessment, which will likely increase opportunities to recognize deficiencies and make student evaluation a more valuable tool.

Original languageEnglish (US)
Pages (from-to)231-235
Number of pages5
JournalAmerican journal of surgery
Issue number2
StatePublished - Feb 2014


  • Assessment
  • Medical student education
  • Surgery clerkship

ASJC Scopus subject areas

  • Surgery


Dive into the research topics of 'Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores'. Together they form a unique fingerprint.

Cite this