Assessment Scores of a Mock Objective Structured Clinical Examination Administered to 99 Anesthesiology Residents at 8 Institutions

Pedro Tanaka*, Yoon Soo Park, Linda Liu, Chelsia Varner, Amanda H. Kumar, Charandip Sandhu, Roya Yumul, Kate Tobin McCartney, Jared Spilka, Alex MacArio

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


BACKGROUND: Objective Structured Clinical Examinations (OSCEs) are used in a variety of high-stakes examinations. The primary goal of this study was to examine factors influencing the variability of assessment scores for mock OSCEs administered to senior anesthesiology residents. METHODS: Using the American Board of Anesthesiology (ABA) OSCE Content Outline as a blueprint, scenarios were developed for 4 of the ABA skill types: (1) informed consent, (2) treatment options, (3) interpretation of echocardiograms, and (4) application of ultrasonography. Eight residency programs administered these 4 OSCEs to CA3 residents during a 1-day formative session. A global score and checklist items were used for scoring by faculty raters. We used a statistical framework called generalizability theory, or G-theory, to estimate the sources of variation (or facets), and to estimate the reliability (ie, reproducibility) of the OSCE performance scores. Reliability provides a metric on the consistency or reproducibility of learner performance as measured through the assessment. RESULTS: Of the 115 total eligible senior residents, 99 participated in the OSCE because the other residents were unavailable. Overall, residents correctly performed 84% (standard deviation [SD] 16%, range 38%-100%) of the 36 total checklist items for the 4 OSCEs. On global scoring, the pass rate for the informed consent station was 71%, for treatment options was 97%, for interpretation of echocardiograms was 66%, and for application of ultrasound was 72%. The estimate of reliability expressing the reproducibility of examinee rankings equaled 0.56 (95% confidence interval [CI], 0.49-0.63), which is reasonable for normative assessments that aim to compare a resident's performance relative to other residents because over half of the observed variation in total scores is due to variation in examinee ability. Phi coefficient reliability of 0.42 (95% CI, 0.35-0.50) indicates that criterion-based judgments (eg, pass-fail status) cannot be made. Phi expresses the absolute consistency of a score and reflects how closely the assessment is likely to reproduce an examinee's final score. Overall, the greatest (14.6%) variance was due to the person by item by station interaction (3-way interaction) indicating that specific residents did well on some items but poorly on other items. The variance (11.2%) due to residency programs across case items was high suggesting moderate variability in performance from residents during the OSCEs among residency programs. CONCLUSIONS: Since many residency programs aim to develop their own mock OSCEs, this study provides evidence that it is possible for programs to create a meaningful mock OSCE experience that is statistically reliable for separating resident performance.

Original languageEnglish (US)
Pages (from-to)613-621
Number of pages9
JournalAnesthesia and analgesia
Issue number2
StatePublished - Aug 1 2020
Externally publishedYes

ASJC Scopus subject areas

  • Anesthesiology and Pain Medicine


Dive into the research topics of 'Assessment Scores of a Mock Objective Structured Clinical Examination Administered to 99 Anesthesiology Residents at 8 Institutions'. Together they form a unique fingerprint.

Cite this