Background: Variability in endoscopic assessment necessitates rigorous investigation of descriptors for scoring severity of ulcerative colitis (UC). Objective: To evaluate variation in the overall endoscopic assessment of severity, the intra- and interindividual variation of descriptive terms and to create an Ulcerative Colitis Endoscopic Index of Severity which could be validated. Design: A two-phase study used a library of 670 video sigmoidoscopies from patients with Mayo Clinic scores 0-11, supplemented by 10 videos from five people without UC and five hospitalised patients with acute severe UC. In phase 1, each of 10 investigators viewed 16/24 videos to assess agreement on the Baron score with a central reader and agreed definitions of 10 endoscopic descriptors. In phase 2, each of 30 different investigators rated 25/60 different videos for the descriptors and assessed overall severity on a 0-100 visual analogue scale. κ Statistics tested inter- and intraobserver variability for each descriptor. A general linear mixed regression model based on logit link and β distribution of variance was used to predict overall endoscopic severity from descriptors. Results: There was 76% agreement for 'severe', but 27% agreement for 'normal' appearances between phase I investigators and the central reader. In phase 2, weighted κ values ranged from 0.34 to 0.65 and 0.30 to 0.45 within and between observers for the 10 descriptors. The final model incorporated vascular pattern, (normal/patchy/ complete obliteration) bleeding (none/mucosal/luminal mild/luminal moderate or severe), erosions and ulcers (none/erosions/superficial/deep), each with precise definitions, which explained 90% of the variance (pR2, Akaike Information Criterion) in the overall assessment of endoscopic severity, predictions varying from 4 to 93 on a 100-point scale (from normal to worst endoscopic severity). Conclusion: The Ulcerative Colitis Endoscopic Index of Severity accurately predicts overall assessment of endoscopic severity of UC. Validity and responsiveness need further testing before it can be applied as an outcome measure in clinical trials or clinical practice.
ASJC Scopus subject areas