Background: In the development of health outcome measures, the pool of candidate items may be divided into multiple forms, thus "spreading" response burden over two or more study samples. Item responses collected using this approach result in two or more forms whose scores are not equivalent. Therefore, the item responses must be equated (adjusted) to a common mathematical metric. Objectives: The purpose of this study was to examine the effect of sample size, test size, and selection of item response theory model in equating three forms of a health status measure. Each of the forms was comprised of a set of items unique to it and a set of anchor items common across forms. Research Design: The study was a secondary data analysis of patients' responses to the developmental item pool for the Health of Seniors Survey. A completely crossed design was used with 25 replications per study cell. Results: We found that the quality of equatings was affected greatly by sample size. Its effect was far more substantial than choice of IRT model. Little or no advantage was observed for equatings based on 60 or 72 items versus those based on 48 items. Conclusions: We concluded that samples of less than 300 are clearly unacceptable for equating multiple forms. Additional sample size guidelines are offered based on our results.
|Original language||English (US)|
|Number of pages||15|
|Journal||Journal of applied measurement|
|State||Published - 2007|
ASJC Scopus subject areas