The effectiveness of multisource feedback assessment (MSF) can be a source of debate: it is praised as a key method to obtain information, but can also be condemned as being steeped in bias. Is MSF dead…or can it be saved? Jason presents a “state-of-the-art narrative review” that looks closely at MSF, summarizing evidence to support its use, examining its limitations and pitfalls and providing practical guidelines to improve its use.
KeyLIME Session 304
Ashworth et. al., A review of multi‑source feedback focusing on psychometrics, pitfalls and some possible solutions. SN Soc Sci (2021) 1:24
Jason R. Frank (@drjfrank)
In your career so far, you have probably had the experience of being the subject of a multisource feedback assessment (aka MSF or 360). Or maybe 2, or 3, or more. How was that experience? Enjoyable? Was it useful? Or did you feel resentful when you heard what others thought of you? You may not be surprised that previous work by Kluger et al showed that MSF programs had unintended negative effects in 33% of cases studied.
MSF has been hailed by #meded types (including, ahem, several of us KeyLIME hosts) as an important tool in our assessment arsenal. However, how much do we really know about the properties of MSF when applied to #meded? MSF has been touted as a key method to obtain information on communication, teamwork, and professionalism. In some jurisdictions it is a mandatory component of in-practice assessment of health professionals. It has also been suggested to be so riddled with inherent biases that any data is “inaccurate and meaningless”. So what is the truth about MSF? Is it dead?
Enter Ashworth, De Champlain, & Kain in Springer-Nature Social Sciences (yes KeyLIME scans many journals for you)…The authors set out to:
(1) summarize the psychometric evidence in support of the use of MSF in medical education;
(2) underscore the limitations and pitfalls of MSF and;
(3) provide practical guidelines to improve its defensibility and use.
Key Points on the Methods
This is state-of-the-art narrative review about the psychometric properties of MSF as an assessment tool. It reads like a tour of the literature with a few pitstops for some stats. The authors begin by describing CBME, and then programmatic assessment, and suggest that there is a key role for MSF. They then use Kane’s validity framework to walk through the available literature from #meded and business to see the evidence available for each element of Kane’s 5 inferences.
(We refer to the Kane/Messick/Cook framework on KeyLIME often, read more about it here. The 5 inferences are: scoring, generalization, extrapolation, explanation, and implications.)
Use Kane’s framework for making a validity argument, the authors describe the evidence they found for each inference…
1. The Scoring Argument looks at the whether scores obtained from an assessment tool are plausible and appropriate. Here, the authors found that untrained raters may be a threat to validity. Furthermore, MSF raters may be peers or colleagues with an established relationship with the subject that skews their scores.
2. The Generalisation Argument looks at the reliability of candidate performances. The authors posit that a candidate’s performance should be stable across raters and patients using MSF. However the literature suggests that 9 rounds of MSF or 14 raters & 25 patients are needed to achieve a g of 0.8.
3. The Explanation Argument looks for evidence that MSF scores are related to the desired underlying competencies. The authors highlighted 2 factor analytic studies. One loaded on professionalism, communication, and team skills. Another loaded on an 8-factor structure: clinical knowledge+communication+ethics+excellence+humanism+accountability+altruism+integrity.
4. The Extrapolation Argument relates to the extent that the competencies found in MSF generalize to other relevant settings. The authors only found 1 study making this MSF argument, and it estimated that 11.5% of the rating variance was related to the context in which the data was collected.
5. The Implications Argument refers to the impact MSF scores have on physician behaviour or patient outcomes. Data in this domain tend to be self-reported and not corroborated by other data. It also tells a conflicting story. Most studies found no impact with elaborate, structured longitudinal coaching for participants. Business literature indicates an effect size in the low range of 0.12.
For every element, the authors found limited evidence to support the validity inference for MSF’s use an assessment strategy. Furthermore, they identified 3 major threats to validity in the use of MSF in #meded:
A. Medical MSF usually involves patients and co-workers, and rarely a supervisor’s view. They call this the “180”, not “360”.
B. The Four Paradoxes of Peer Review operate on medical MSF: 1) peer physicians struggle to be an impartial judge and caring colleague at the same time; 2) assessing a peer can hurt the teams physicians are often embedded in; 3) rating scales used to increase responses may harm the quality of observation data; and 4) high-stakes assessments vs low-stakes may have dove/hawk effects on raters.
C. Implementation disconnects between the intent of MSF programs and actual implementation are also operating. They suggest the literature finds physicians do not find MSF results credible.
The authors conclude that MSF is an important assessment approach—it is the sole method published for several kinds of competencies—but is fundamentally flawed in its current form.
The authors provide recommendations for improving MSF:
1. End opportunistic sampling of raters.
2. Use different scoring strategies to minimize rater range restriction
3. Greater transparency—especially in intended use and disallowed use of MSF data
4. Including MSF as part of a holistic program of assessment
5. Employ rich comments in decision making, along with scores, and
6. Rater training.
Spare Keys – Other take home points for Clinician Educators
- We have done a few of these state-of-the-art type narrative reviews. While they may be less systematic and subject to author biases, they are also very useful to summarize “the story so far” in a field. They allow others to build on previous work and are often highly cited.
- Beware the pitfalls of looking at any single assessment tool in isolation. We advocate for thinking about an assessment system, with its ecosystem of various observations and data points being integrated into a more holistic view of an individual’s competency development (programmatic assessment).
- Many of the criticisms in this paper about MSF apply to most other assessment tools available too. Much more to do in #meded research.
Access KeyLIME podcast archives here
The views and opinions expressed in this post and podcast episode are those of the host(s) and do not necessarily reflect the official policy or position of The Royal College of Physicians and Surgeons of Canada. For more details on our site disclaimers, please see our ‘About’ page