#KeyLIMEpodcast 125: SET Up To Fail – Wars over Student Evaluation of Teaching

The Key Literature In Medical Education returns to the topic of student evaluation of teaching (SET).  We last tackled this in Episode #85.

This epic systematic review and meta-analysis suggests that (SPOILER ALERT) student evaluation is ….  Nah, you’re going to have to read the abstract (below) or listen to the podcast to get the punchline.  As Jason suggests, maybe we don’t need to get rid of SET altogether, but we should reframe it as “learner satisfaction” and not “teacher evaluation.”

  • Jonathan


KeyLIME Session 125 – Article under review:

Listen to the podcast

View/download the abstract here.


Uttl B, White CA, Gonzalez DW.  Meta-analysis of faculty’s teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation. 2016. [ePub ahead of print]

Reviewer: Jason R. Frank (@drjfrank)


In episode 85 of the #KeyLIMEpodcast, we reviewed the perennial debate about the role of learner ratings in assessing teacher performance and effectiveness.  Essentially it boils down to a question: “Do student perceptions of teacher effectiveness correlate with learning outcomes”? Learner ratings are usually some combination of a global Likert score and scores on a number of various supposed teaching characteristics that are harvested near the end of a course. When we reviewed Spooren et al, we discussed how there is no one broadly accepted theory of teaching, no agreed upon set of characteristics of value, and very weak correlations with learning outcomes (almost always some kind of exam).

However, student evaluations of teaching (aka “SET” in higher ed), are essentially universal. Teaching faculty (but not usually research faculty) careers can begin, rise, or fall on them. Why do we still use and give so much weight to SET? The top reasons are:

  1. SET data are cheap and easy;
  2. SET data are cheap and easy;
  3. SET data are cheap and easy;
  4. Students are positioned to observe the teacher;
  5. SET data allow students a voice in their education;
  6. SET data are a kind of social accountability.

Unfortunately, students being present with a pulse does not make them experts on teaching or even learning (e.g. Dunning-Kruger effect). In fact, there is evidence that there are numerous other factors that impact learning beyond teaching (called “TEIFs” in the higher ed lit…we all have our jargon). Thus, we have an ongoing debate and a tension in our educational design worldwide.


“Does SET correlate with learning outcomes in multi-section studies?” Uttl, White, and Gonzalez set to reexamine the evidence for SET. They chose to a) reanalyze the highly cited meta-analyses using contemporary methods, and b) to conduct their own meta-analysis of SET from primary studies.

Type of Paper

Systematic review/Meta-analysis

Key Points on Methods

After a very long and interesting introduction to the problem and the available literature (that crosses dangerously into a full-on rant), the authors describe their two approaches to evaluating SET:

1) A re-analysis of the data from the most-cited previous meta-analyses of SET that had explained their methods. This required going back to microfiche of original PhD theses, etc, to get the necessary data and insights.

2) A primary metanalyses and critical appraisal of all the available multi-section (essential randomized trials of teaching on learning) studies in higher ed. To be included, these had to be English studies with multiple random groups of learners and an objective measure of learning reported as measures of associations.

Key Outcomes

  1. The original and most high-cited SET meta-analysis was Cohen in 1981, with a correlation of r=.43.
  2. A re-analysis of the metas by Cohen (1981), Feldman (1989), and Clayson (2009) revealed numerous fundamental methodological flaws and threats to validity. Cohen’s meta suggested that at best 10% of variance in learning is related to teaching. Feldman’s data was found to be an artifact of small studies that were not controlled for. Contributions of teaching disappear when weighted to sample size. Clayson’s methods were found to just fundamentally compromised.
  3. A primary meta-analysis with numerous analyses and adjustments essentially found NO correlation (expressed in fancy stats).

Key Conclusions

The authors conclude that, in this contemporary meta-analysis of randomized learner studies of SET, there is essentially no correlation between teaching and learning. (There was a correlation with prior knowledge of learners and learning outcomes though.)

Spare Keys – other take home points for clinician educator

  1. This is a long study that has been harshly critiqued as “sour grapes” from higher ed administrators, but it is well-written and defensible. Kudos to the courage of the authors to shine light on dogma, even when inconvenient in education.
  2. The meta-analyses techniques in this paper are interesting and give usuful hints to the kinds of questions one can ask using such techniques.
  3. We should stop calling such data “teacher evaluations” and label them for what they are “learner satisfactions”—still valuable, just not evidence of learning.
  4. Notwithstanding that this paper deals with teaching in a different setting to clinical medicine, maybe we should consider some radical alternative teaching effectiveness evaluations (peer reviews, expert reviews, objective measures of learning, evaluation of teaching EPAs, etc).
  5. I got this in press paper off twitter – another example of the power of social media.

Shout out

Send us your thoughts – is this the death knell of learner evaluations?

Access KeyLIME podcast archives here

Check us out on iTunes