In today’s article, selected by Linda, the authors developed a computerized analysis to seek correlation between specific words on in-training evaluation reports (ITERs) and subsequent poor progress or failure.
Can language be a predictor of those at risk of failure? How accurate can it be? .. and, how does one go about determining this? Read below, or better yet, listen in to learn more!
KeyLIME Session 248
Tremblay et. al., 2019. Detection of Residents with Progress Issues Using a Key Word–Specific Algorithm.Journal of Graduate Medical Education. 11(6):656-62.
Linda Snell (@LindaSMedEd)
Identification of poor progression in residents often occurs late in training. Early identification of the 5-10% of residents in difficulty is a challenge.
Narrative comments in assessment have been shown to be valuable, and perhaps a semantic analysis of narrative information as part of CBME assessments might be useful for detecting problems earlier.
The question is…is there a relationship between specific words on in-training evaluation reports (ITERs) and subsequent poor progress or failure.
The authors “sought to develop a novel computerized semantic analysis, which consists of an algorithm that is able to detect residents with progress issues, based on certain key words. The goal is to create a functional algorithm to identify residents at risk of failure.”
Key Points on the Methods
Created a database of all ITERs, all residents at Université Laval from 2001 – 2013. (NB this is before CBME in most disciplines).
An instructional designer reviewed all ITERs narratives (~220,000 words) & proposed terms associated with reinforcing and under-performance feedback. Face validity or ‘practical significance’ of words confirmed.
An algorithm based on these key words was constructed by recursive partitioning. *a statistical method for multivariate analysis which creates an algorithm / decision tree to correctly classify things by splitting them into dichotomous variables
Separated specialty and family medicine as different ITERS were used.
These words / algorithms compared to the final ‘fail’ or ‘in difficulty’ ratings.
The developed algorithm was applied with the aim of achieving 100 % sensitivity while maximizing specificity. *i.e. looked at true positives (sens), then true negatives (spec)
Sens, spec, PPV and NPV calculated.
Nearly 42,000 ITERS for nearly 3300 residents over 12 years.
Specialty residents: 100 sensitivity, 88% specificity. Progress issues identified in 4%, algorithm correctly classified these.
BUT it also identified 256 who were in difficulty when they were not. So the PPV was 23.4%, NPV 100%.
Family medicine residents: 100 sensitivity, 79% specificity. Progress issues identified in 6%, algorithm correctly classified these.
BUT it also identified 221 who were in difficulty when they were not. So the PPV was 23.3%, NPV 100%.
Specificity was maximized, knowing that this would give a lower PPV. The authors felt this compromise was worthwhile as “the consequences of delaying the detection of a resident in difficulty are more important than reviewing the file of a good resident.” They calculated the number of ‘false positive’ files to be reviewed would be no more than 1-2 per year for a PD.
With a low prevalence of residents in difficulty, a key word approach would give program directors with large resident cohorts,those with little experience, and the postgraduate associate dean a valuable flagging tool.
The authors conclude…”Classification and regression trees may be helpful to identify pertinent key words and create an algorithm, which may be implemented in an electronic assessment system to detect residents at risk of poor performance.”
Spare Keys – other take home points for clinician educators
Another use for data analytics!
Need to see the ‘actual’ words
Access KeyLIME podcast archives here