Pause for Thought . . . Let’s ditch the numbers and focus on the comments when assessing learners

By Lynfa Stroud 

“Expectation is the root of all heartache.” – William Shakespeare            

In my role as an attending on the general medicine wards, I complete In-Training Evaluation Reports (ITERs) for the residents who I work with. I am sure that most people reading this also either complete ITERs, (or end-of-rotation reports) or have ITERs completed about them if they are still in-training. Typically, the feedback provided to learners with this type of assessment is a combination of a numerical score with narrative comments. In my experience,  residents seem largely focused solely on the numerical score. This is somewhat understandable. After all, this score is usually used to make a judgment (passing or failing the rotation).

However, there is some evidence in the broader education literature (studies done with elementary school children)1  that numerical scores may reduce the likelihood that learners use narrative suggestions for future performance improvement and that the presence of numerical scores may decrease recall of comments.  Similarly, recent evidence in medical education suggests that feedback provided in a summative context may not be used effectively by students.2,3

To me, this potential issue is further exacerbated by how the score on an ITER is generated or what it even means. Anchors for numeric ratings on our ITERs fall into categories that may be familiar: Unsatisfactory, Needs Improvement, Meets Expectations, Exceeds Expectations, or Outstanding.  The reference point stated is: compared to performance normally seen in trainees in this postgraduate year.

But what should our “expectations” be of residents in each postgraduate year?  My expectations may differ from colleague X or colleague Y.  (See here for a previous discussion on this point). Additionally, it is stated that “the expectation is” (no pun intended) that the majority of residents should fall into the category of “meets expectations”, equivalent to 3 out of 5 on a Likert scale. However, at least at my centre, we simultaneously recognize that in reality the minority of residents receive this designation on a given rotation; faculty who frequently use this part of the scale are often considered “hawks”, whereas residents who consistently receive this score are considered to be in potential academic difficulty. This means that most faculty expect to assign scores of “4 or 5” and most resident expect that they will receive these types of scores. In reality, shouldn’t almost all residents fall into “needs improvement”? After all, they are in a training program. It stands to reason that they likely have areas for further improvement in their performance, as do we all with life-long learning.

Perhaps a way for faculty to feel more comfortable giving meaningful feedback and for residents to be able to focus on the narrative for future improvement is to shift focus onto the formative comments. Maybe we don’t need “scores” for every single rotation? Instead, could we concentrate our efforts on providing specific, action-oriented formative feedback, rather than on positioning a resident on a scale? Maybe other programs are already doing this?

I recognize that we need some measure of how residents are doing and “red flags” for those who are struggling, ideally earlier rather than later in training. But there may be ways for these individuals to be identified through review of the narrative comments (albeit we may need to focus on improving the quality of comments provided), rather than just by focusing on scores as a trigger.4 The potential advantages of removing scores from ITERs would likely benefit all residents. In my next post, I’ll include some thoughts that I have about the possible residents’ perspective on this.

  1. Butler R. Enhancing and undermining intrinsic motivation: The effects of task-involving and ego involving evaluation on interest and performance. Br J Educ Psych. 1988;58:1-14.
  2. Harrison CJ, Konings KD, Schuwirth L, Wass V, van der Vleuten C. Barriers to the uptake and use of feedback in the context of summative assessment. Adv in Health Sci Educ Theory Pract. 2015;229-45.
  3. Harrison CJ, Konings KD, Molyneux A, Schuwirth L, Wass V, van der Vleuten CPM. Web-based feedback after summative assessment: How do students engage? Med Educ. 2013;47:734-4.
  4. Ginsburg S, Regehr G, Lingard L, Eva KW. Reading between the lines: faculty interpretations of narrative evaluation comments. Med Educ. 2015;49:296-306.


* I recognize that this post opens up many discussions (that I have not explicitly expanded on here) about issues of better construct aligned scales, normative versus criterion standards, milestones, EPAs, etc, etc!