Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): Applications (with illustrations) to measures of physical functioning ability and general distress

Jeanne A. Teresi*, Katja Ocepek-Welikson, Marjorie Kleinman, Karon F. Cook, Paul K. Crane, Laura E. Gibbons, Leo S. Morales, Maria Orlando-Edelen, David Cella

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

51 Scopus citations

Abstract

Background: Methods based on item response theory (IRT) that can be used to examine differential item functioning (DIF) are illustrated. An IRT-based approach to the detection of DIF was applied to physical function and general distress item sets. DIF was examined with respect to gender, age and race. The method used for DIF detection was the item response theory log-likelihood ratio (IRTLR) approach. DIF magnitude was measured using the differences in the expected item scores, expressed as the unsigned probability differences, and calculated using the non-compensatory DIF index (NCDIF). Finally, impact was assessed using expected scale scores, expressed as group differences in the total test (measure) response functions. Methods: The example for the illustration of the methods came from a study of 1,714 patients with cancer or HIV/AIDS. The measure contained 23 items measuring physical functioning ability and 15 items addressing general distress, scored in the positive direction. Results: The substantive findings were of relatively small magnitude DIF. In total, six items showed relatively larger magnitude (expected item score differences greater than the cutoff) of DIF with respect to physical function across the three comparisons: "trouble with a long walk" (race), "vigorous activities" (race, age), "bending, kneeling stooping" (age), "lifting or carrying groceries" (race), "limited in hobbies, leisure" (age), "lack of energy" (race). None of the general distress items evidenced high magnitude DIF; although "worrying about dying" showed some DIF with respect to both age and race, after adjustment. Conclusions: The fact that many physical function items showed DIF with respect to age, even after adjustment for multiple comparisons, indicates that the instrument may be performing differently for these groups. While the magnitude and impact of DIF at the item and scale level was minimal, caution should be exercised in the use of subsets of these items, as might occur with selection for clinical decisions or computerized adaptive testing. The issues of selection of anchor items, and of criteria for DIF detection, including the integration of significance and magnitude measures remain as issues requiring investigation. Further research is needed regarding the criteria and guidelines appropriate for DIF detection in the context of health-related items.

Original languageEnglish (US)
Pages (from-to)43-68
Number of pages26
JournalQuality of Life Research
Volume16
Issue numberSUPPL. 1
DOIs
StatePublished - Aug 2007

Funding

Data were collected as part of the Quality of Life Evaluation in Oncology Project funded by the National Cancer Institute (RO1 CA 60068, David Cella PI). This study was of patients with cancer or HIV/AIDS. Data were analyzed with respect to age, gender and race. The sample sizes used in the analyses shown in the Figures and Tables were 236 African-Americans and 1324 whites, 719 females and 914 Acknowledgements The authors thank Douglas Holmes, Ph.D. for his review of several versions of this manuscript. The authors also thank three anonymous reviewers and the editor for their helpful comments related to an earlier version of this manuscript. These analyses were conducted on behalf of the Statistical Coordinating Center to the Patient Reported Outcomes Information System (PROMIS) (AR052177). Funding for analyses was provided in part by the National Institute on Aging, Resource Center for Minority Aging Research at Columbia University (AG15294), and by the National Cancer Institute through the Veteran’s Administration Measurement Excellence and Training Resource Information Center (METRIC). An earlier version of this paper was presented at the National Institutes of Health Conference on Patient Reported Outcomes, Bethesda, June, 2004.

Keywords

  • Differential item functioning
  • General distress
  • Item response theory
  • Physical functioning

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health

Fingerprint

Dive into the research topics of 'Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF): Applications (with illustrations) to measures of physical functioning ability and general distress'. Together they form a unique fingerprint.

Cite this