Infant polysomnography: Reliability

David H. Crowell*, Lee J. Brooks, Theodore Colton, Michael J. Corwin, Toke T. Hoppenbrouwers, Carl E. Hunt, Linda E. Kapuniai, George Lister, Michael R. Neuman, Mark Peucker, Sally L. Davidson Ward, Debra E. Weese-Mayer, Marian Willinger, Terry M. Baird, David Hufford, Thomas G. Keens, Richard J. Martin, Rangasamy Ramanathan, Susan C. Schafer, Jean M. SilvestriLarry Tinsley

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

54 Scopus citations


Infant polysomnography (IPSC) is an increasingly important procedure for studying infants with sleep and breathing disorders. Since analyses of these IPSG data are subjective, an equally important issue is the reliability or strength of agreement among scorers (especially among experienced clinicians) of sleep parameters (SP) and sleep states (SS). One basic issue of this problem was examined by proposing and testing the hypothesis that infant SP and SS ratings can be reliably scored at substantial levels of agreement, that is, kappa (κ) ≤ 0.61. In light of the importance of IPSG reliability in the collaborative home infant monitoring evaluation (CHIME) study, a reliability training and evaluation process was developed and implemented. The bases for training on SP and SS scoring were CHIME criteria that were modifications and supplements to Anders, Emde, and Parmelee (10). The κ statistic was adopted as the method for evaluating reliability between and among scorers. Scorers were three experienced investigators and four trainees. Inter- and intrarater reliabilities for SP codes and SSs were calculated for 408 randomly selected 30-second epochs of nocturnal IPSG recorded at five CHIME clinical sites from healthy full term (n = 5), preterm (n = 4), apnea of infancy (n = 2), and siblings of the sudden infant death syndrome (SIDS) (n = 4) enrolled subjects. Infant PSG data set 1 was scored by both experienced investigators and trained scorers and was used to assess initial interrater reliability. Infant PSG data set 2 was scored twice by the trained scorers and was used to reassess interrater reliability and to assess interrater reliability. The κs for SS ranged from 0.45 to 0.58 for data set 1 and represented a moderate level of agreement. Therefore, rater disagreements were reviewed, and the scoring criteria were modified to clarify ambiguities. The κs and confidence intervals (CIs) computed for data set 2 yielded substantial interrater and intrarater agreements for the four trained scorers; for SS, the κ = 0.68 and for SP the κs ranged from 0.62 to 0.76. Acceptance of the hypothesis supports the conclusion that the IPSG is a reliable source of clinical and research data when supported by significant κs and CIs. Reliability can be maximized with strictly detailed scoring guidelines and training.

Original languageEnglish (US)
Pages (from-to)553-560
Number of pages8
Issue number7
StatePublished - 1997


  • Infant
  • Polysomnography
  • Reliability
  • SIDS
  • Scoring

ASJC Scopus subject areas

  • Clinical Neurology
  • Physiology (medical)


Dive into the research topics of 'Infant polysomnography: Reliability'. Together they form a unique fingerprint.

Cite this