-
- Rachel Yudkowsky, Yoon Soo Park, Janet Riddle, Catherine Palladino, and Georges Bordage.
- Dr. Yudkowsky is associate professor, Department of Medical Education, and director, Dr. Allan L. and Mary L. Graham Clinical Performance Center, University of Illinois at Chicago, Chicago, Illinois. Dr. P... more
- Acad Med. 2014 Jul 1; 89 (7): 1057-62.
PurposeHigh-quality checklists are essential to performance test score validity. Prior research found that physical exam checklists of items that clinically discriminated between competing diagnoses provided more generalizable scores than all-encompassing thoroughness checklists. The purpose of this study was to compare validity evidence for clinically discriminating versus thoroughness checklists, hypothesizing that evidence would favor the former.MethodFaculty at four Chicago-area medical schools developed six standardized patient (SP) cases with checklists of about 20 items ("thoroughness [long] checklists"). Four clinicians identified a subset of items that clinically discriminated between competing diagnoses of each case ("clinically discriminating [short] checklists"). Cases were administered to 155 University of Illinois at Chicago fourth-year medical students during their 2011 Clinical Skills Examination (CSE). Validity evidence was compared for CSE scores based on thoroughness versus clinically discriminating checklist items.ResultsValidity evidence favoring clinically discriminating checklists included response process: greater SP checklist accuracy (kappa = 0.75 for long and 0.84 for short checklists, P < .05); internal structure: better item discrimination (0.28 long, 0.42 short, P < .001); internal consistency reliability (0.80 long, 0.92 short); standard error of measurement (z score 8.87 long, 8.05 short); and generalizability (G = 0.504 long, 0.533 short). There were no significant differences overall in relevance ratings, item difficulty, or cut scores of long versus short checklist items.ConclusionsLimiting checklist items to those affecting diagnostic decisions resulted in better accuracy and psychometric indices. Thoroughness items performed without thinking do not reflect clinical reasoning ability and contribute construct-irrelevant variance to scores.
Notes
Knowledge, pearl, summary or comment to share?