EDUCATION THEORY MADE PRACTICAL – Volume 3, Part 7: Realist Evaluation

As part of the ALiEM Faculty Incubator program, teams of 2-4 incubator participants authored a primer on a key education theory, linking the abstract to practical scenarios. For the third year, these posts are being serialized on our blog, as a joint collaboration with ALiEM. You can view the first e-book here – the second is nearing completion and will soon be released.  You can view all the blog posts from series 1 and 2 here.

The ALiEM team loves hearing your feedback prior to publication. No comment is too big or too small and they will be used to refine each primer prior to the eBook publication.  (note: the blog posts themselves will remain unchanged)

This is the seventh post of Volume 3. You can find the previous posts here: Bolman and Deal’s Four-Frame Model; Validity; Mayer’s Cognitive Theory of Multimedia Learning; The Kirkpatrick Model: Four Levels of Learning EvaluationCurriculum Development and Programmatic Assessment.


Realist Evaluation

AuthorsJason An; Christine Stehman (@crsemcccf); Randy Sorge

Editor:  Jordan Spector

Main Authors or Originators: R Pawson & N Tilley
Other important authors or works: G Wong, T Greenhalgh

Part 1: The Hook

City Hospital recently hired Claire as a senior nursing administrator. Claire was recruited after working for ten years across town at Ivory Tower Hospital, spearheading a number of operational innovations to improve Emergency Department (ED) metrics. City Hospital hired Claire with hopes that she could replicate the interventions and similarly improve metrics.

In her first departmental meeting, Claire proposed a number of system revisions with regard to triage processes at the City Hospital. Claire recommended moving patients to the treatment area immediately upon arrival in the ED, to permit physician evaluation, registration and triage to occur simultaneously, with the goal of reducing time to provider and total patient time in the ED. Her proposal included the creation of a “sort RN” who would briefly assess patient acuity and assign the patients to one of two treatment areas – a lower acuity venue with mostly chairs, or a higher acuity area with stretchers. After the patient was assigned, the patient would be triaged by the nurse, registered, and seen by a physician in the treatment room in seamless succession. In her experience, this model worked well at Ivory Tower Hospital, so she reasoned that it should also work at City Hospital.

Though several physicians expressed concerns with the new triage process, Claire was confident it would work. After all, it did at Ivory Tower Hospital! Claire privately dismissed the concerns as systemic resistance to change (especially change proposed by a newcomer). In the end, the Department Chair supported Claire’s proposal, and, within weeks, City Hospital’s ED remodeled their triage system to resemble the work flow at Ivory Tower Hospital.

ED administrators met again eight weeks after the new triage operations were implemented. During this session, it seemed that everyone was unhappy, calling the new system “a disaster”. Physicians expressed concern over inaccurate triage, citing significant delays in obtaining vital signs and EKGs for their patients. There had been several high-risk cases where an otherwise well-appearing patient was triaged to the low acuity zone, only to have been identified as ill, delaying appropriate care. Furthermore, there was no clear process for “upgrading” patients from the low acuity zone to the high acuity zone. Hospital leadership complained to the ED directors that ED metrics looked worse, as a larger number of patients were leaving the ED prior to being fully registered, resulting in lost revenue relative to the previous City Hospital triage model.

The Department Chair calls Claire into his office. He wants to hear her thoughts about why her system failed to improve metrics at City Hospital as they had at Ivory Tower Hospital. Claire is baffled, embarrassed, and needs to find a way to make things right.

Part 2: The Meat


Realist (or realistic) evaluation (RE) emerged in the 1990’s as a method to evaluate complex social programs. Prior to this method, investigators would attempt to assess and optimize features of social programs through simple cause-and-effect analyses, in essence asking “can we make the program better with a particular intervention?” If the intervention was associated with a desired result, investigators would argue that the intervention was ‘the cause’ that directly led to the desired ‘effect’. And for program directors, it followed that the same intervention would work anywhere it was implemented to achieve the same desired effect. Not surprisingly, this was not accurate, as many programmatic interventions did have the same effect in different venues. Ray Pawson and Nick Tilley were the first to describe that complex social programs depend on the interactions of the participants, the available resources, and the environment in which they occur.1 In RE, instead of asking simply if the program works, evaluators attempt to answer why and what about a program works, for whom it works and within what circumstances (the ‘when and where’) it works.1 Each evaluation begins with the generation of hypotheses that might explain the balance of resources and choices for each program participant that will lead to a desired outcome. Evaluators then test these hypotheses and modify the program based on the results.1 The creators grounded realist evaluation in scientific realism and the generative theory of causation by building upon the work of many others.12


Realist Evaluation provides users a method for performing evaluation research. Evaluation research is a class of research that focuses on how programs work and how to optimize program effectiveness.3 It initially focused on social programs, like installing cameras in a parking lot to decrease car break-ins or starting literacy lessons to increase parental involvement in school, but now describes evaluation of any program designed to change behaviors.3 In the past, researchers would follow a secessionist theory of causation: investigators set up a trial program in one area and compare outcomes of that program to outcomes in a similar area where the program had not been implemented (a control area). Results of these “experiments” led investigators to declare success (or not) through inferred causation. However, when investigators attempted the same intervention in a new location, and it did not succeed, program stakeholders were left confused.

The pioneers of RE (Pawson and Tilley) argued that within complex systems and programs, simple ‘cause and effect’ analyses were not applicable, as they often did not account for factors in the milieu that directly contributed to the success or failure of such programs12,. The authors put forth a generative theory of causation which relies on observations of patterns between inputs (causes) and outputs (effects), including both externally observable features and possible internal features. These observations allow for a more nuanced examination of cause and effect, both when the cause is effective towards the desired ends, and when it is not. The authors cite the use of gunpowder as an example: gunpowder is an effective explosive specifically when the conditions and circumstances are optimized: densely packed, in sufficient quantity, dry, etc.1

Within this construct, the efficacy of a social program can only be understood through examination of the context. In other words, only through identification of a program’s interaction with cultural, social and economic factors can an investigator determine how and why a program would work to produce (or not produce) the desired outcome.

RE evaluates mechanisms suspected to bring about change and asks in which contexts these changes occur. It does so through a cycle of experimentation. First, investigators formulate a theory of how everything (intervention, choices, relationships, behavior, conditions) comes together to generate an outcome. Then they generate hypotheses about what might produce a desired, sustainable change. With appropriate methods (often qualitative and quantitative), they test these hypotheses with the goal of results or outcomes that are highly specific to the context. Should the results deviate from expected, theories are modified, and the cycle continues.1

In summary, realist evaluation uses a methods neutral, theory-driven model to find ways to identify, state, test and refine theories as to what about and why a program works, for which population, and in what circumstances.

Modern takes or advances

In addition to RE, the term ‘realist synthesis’ or ‘realist review’ was coined to describe the synthesis of data when analyzing complex systems – not to distill complex issues down to simple descriptions, but to guide programmatic leadership and policy makers with sophisticated and practical analyses to utilize in the planning and administration of local or regional programs. 4,5,6

With such a broad definition, one can see how this construct might be applied to a number of additional areas of study, beyond social programs. In the 20 years since Realistic Evaluation was published, realist evaluation has been applied in the areas of medicine and healthcare systems to improve infection control, interpersonal-skills assessment, disease-specific health initiatives (such as heart health and mental illness treatment), e-learning, faculty development, and medical education in general.7,8,9,10,11,12,13,14

There have been three modifications of RE that warrant acknowledgement here.

Keller, et al. described the combination of RE with design theory as a means to evaluate complex innovations.15 The authors advocate that innovation should begin with explicit identification of the underlying assumptions behind the innovation, before performing a realist evaluation, to better understand context-mechanism-outcome triads evident after implementation. The authors argue that this combination method will buttress the expected efficacy and increase dissemination of such innovations.15

Bonell, et al., offered a counter to the typical RE position (set forth by Pawson and Tilley) that randomized controlled trials are too narrowly defined to be pertinent in the assessment of complex public interventions. This piece is long and largely theoretical, but it sets forth a series of examples where the data from RCTs led to a better understanding of a complex system, and the authors take the position that there need not be tension between RCTs and realism.16

Finally, Ellaway, et al., describe a hybrid approach to systematic review.17 This study sought to identify if and how communities that host medical education programs impact upon said programs.17 Their literature search identified a number of papers for investigation, though only (approximately) half were hard data (rather than a narrative or theoretical exposition).17 All studies were examined using both an outcomes method of systematic review as well as a realist evaluation. Overall, the authors argue that this dual method of review created a deeper understanding of their study question and the literature as a whole.17

Other examples of where this theory might apply in both the classroom & clinical setting

Problem based learning (PBL) is a curricular provision in many medical schools where learners examine complex patient care vignettes in small group discussions. Devised as a means to make medical learning more active and interesting for the learner, some data suggests that graduates with PBL curricula demonstrate equivalent or superior professional competencies compared with graduates of a more traditional medical school curriculum.18 The PBL model is a form of RE for the learner, as lessons address clinical issues with respect to the whole of the patient. In addition, as PBL has been proven to be effective learning but is variably used across schools, RE might aid in an analysis of which schools, which learners, and within which circumstances would students most benefit from a PBL curriculum.

There has been an exponential increase in the number of online learning resources in medical education (podcasts, websites, and blogs – termed E-learning).19 A realist review of internet-based medical education by Wong et. al. sought to describe who is utilizing these resources, when they are using them, and to what benefit. The authors demonstrate that learners were more likely to utilize E-learning if it offered a perceived advantage over non-internet alternatives, was technically easy to use, was compatible with those learners’ values and norms, and provided opportunity to interface with a teacher or tutor.2

Annotated Bibliography of Key Papers

Pawson R, Tilley N. Realistic Evaluation Bloodlines. Am J Eval. 2001;22(3):317-324.20

Written by the authors most associated with RE, this piece describes the RE method in the context of regional blood donation practices. The authors relay six articles that address different ways of acquiring and distributing blood. The article concludes with six maxims to improve future evaluations, maxims which many consider foundational within the realist approach. These include:
1. Always speak of evaluations in the plural – advocating for a broad array of investigative questions of one’s program model, the combination of which are necessary to better understand and optimize complex programs.
2. Be unafraid to ask big questions of small interventions and to use small interventions to test big theories.
3. Use multiple methods and data sources in the light of opportunity and need.
4. Figure out which mechanisms are relevant to produce optimum outcomes by context.
5. Never expect to know “what works”, just keep trying to find out.
6. Direct meta-analytic inquiries at common policy mechanisms – advocating for a thorough evaluation of the attempts at a particular intervention in multiple programs in a community, with the various outcomes, to better understand the whole.20

Wong G, Greenhalgh T, Westhorp G, Pawson R. Realist methods in medical education research: What are they and what can they contribute? Med Educ. 2012;46(1):89-96.14

RE is relevant to medical education research and practices. This article explains realism theory in detail and prescribes key principles in the performance of realist research. The authors include concrete examples of the circumstances within medical education in which realist approaches can be used effectively, a feature that makes this article worthwhile for medical educators.

Wong G, Greenhalgh T, Pawson R. Internet-based medical education: A realist review of what works, for whom and in what circumstances. BMC Med Educ. 2010;10(1).12

This study is a realist review of internet-based medical education. The authors use two theories (Davis’s Technology Acceptance Model and Laurillard’s model of integrative dialogue) to outline the characteristics of internet-based medical education, arguing the intuitive point that internet learning materials have value relative to the learner and the learning context. The authors provide a list of questions based on their research in order to help educators and learners choose the appropriate internet-based course for their specific situation.


The main limitation of the realist evaluation is that it is both time and resource intensive.

A realist evaluation often requires large interdisciplinary teams with high levels of experience and training to carry out complicated evaluations.6 RE takes time: to analyze all variables and to accurately understand the interactions between interventions and outcomes. Because a realist evaluation utilizes both qualitative and quantitative data, the review often needs to draw upon a wide range of information from a number of diverse primary sources, which can also be onerous.6

Another limitation of realist evaluation, as with any theory driven evaluation, is the logistics required for a technically sound and thorough study. Common logistical issues include how to define the scope of the review; how to determine the depth of assessment for each of a multitude of variables; how to determine what body literature to search; how to critically appraise a very diverse sample of primary studies; how to collate, analyze, and synthesize the findings; and how to make recommendations that are both defensible and useful.21

Finally, a realist review attempts to evaluate variables explicit and occult, but because it cannot possibly account for every variable, realist studies tend to produce tentative recommendations at best. It cannot, by its very nature, produce generalizable recommendations specifically because all conclusions are context specific. Ultimately, the results of a RE often provide nothing more than recommendations to fine tune a program or system.6

Part 3: The Denouement

Claire’s system failed because she did not have a full understanding of all the contributors to the triage process in the Emergency Department of her new hospital. Despite her operations experience and her best intentions, Claire missed the site-specific operational differences, as well as the environmental and cultural conditions that directly impact the efficiency of triage at her new institution.

After some education in the RE model of program refinement, Claire tried to address individual features that had bearing on the efficiency of ED triage at City Hospital. Slowly, Claire learned enough from the various stakeholders to refine her triage model successfully. She learned that City Hospital has significantly less staff per patient than her previous institution. Claire’s original triage model left registration staff completely overwhelmed. A new and improved triage work flow needed a mechanism to prevent providers from discharging a patient prior to them being registered. Claire worked with Information Technology staff to create a ‘hard stop’ in the EMR to prevent placement of a discharge order until after ED registration was complete. Claire also learned that City Hospital has many more non-English speaking patients than at her prior hospital, creating a language barrier in all facets of the ED visit, a variable she never had to account for at Ivory Tower Hospital. Claire worked to place language-line phones in every treatment room. In addition, Claire hired volunteers to work in the ED waiting room, offering non-medical assistance, to reduce patients left without being seen. It was specifically through Claire’s effort to better understand the intricacies of her new hospital (i.e. how the system currently operates, their staffing capabilities, their patient population) that operational metrics began to improve at City Hospital. In fact, about one year after Claire’s arrival, with the new system borne of Claire’s realist evaluation of City Hospital, the ED at City posted some of the best efficiency metrics in the history of the institution.

Don’t miss the 8th post in the series, coming out Tuesday, June 18, 2019!



1. Pawson R, Tilley N. Realistic Evaluation. Sage; 1997. Accessed June 29, 2018.

2. Tilley N, Pawson R. Realistic Evaluation: An Overview. Br J Sociol. 2000;49(September):331.

3. Powell R. Evaluation Research: An Overview. Libr Trends. 2006;55(1):102-120.

4. Pawson R, Greenhalgh T, Harvey G, Walshe K. Realist Synthesis : An Introduction. Manchester ESRC Res Methods Program. 2004;February.

5. Wong G, Greenhalgh T, Pawson R. What is realist review and what can it do for me? An introduction to realist synthesis. [accessed 27 June 2018].

6. Pawson R.a Greenhalgh T. d HG. e WK. f. Realist review – A new method of systematic review designed for complex policy interventions. J Heal Serv Res Policy. 2005;10(SUPPL. 1):21-34.

7. Williams L, Burton C, Rycroft-Malone J. What works: a realist evaluation case study of intermediaries in infection control practice. J Adv Nurs. 2013;69(4):915-926.

8. Greenhalgh T, Humphrey C, Hughes J, Macfarlane F, Butler C, Pawson R. How Do You Modernize a Health Service? A Realist Evaluation of Whole-Scale Transformation in London. Milbank Q. 2009;87(2):391-416.

9. Meier K. A realistic evaluation of a tool to assess the interpersonal skills of pre-registration nursing students. 2012.

10. Clark AM, MacIntyre PD, Cruickshank J. A critical realist approach to understanding and evaluating heart health programmes. Heal An Interdiscip J Soc Study Heal Illn Med. 2007;11(4):513-539.

11. Chidarikire S, Cross M, Skinner I, Cleary M. Treatments for people living with schizophrenia in Sub-Saharan Africa: an adapted realist review. Int Nurs Rev. 2018;65(1):78-92.

12. Wong G, Greenhalgh T, Pawson R. Internet-based medical education: A realist review of what works, for whom and in what circumstances. BMC Med Educ. 2010;10(1).

13. Sorinola OO, Thistlethwaite J, Davies D, Peile E. Realist evaluation of faculty development for medical educators: What works for whom and why in the long-term. Med Teach. 2017;39(4):422-429.

14. Wong G, Greenhalgh T, Westhorp G, Pawson R. Realist methods in medical education research: what are they and what can they contribute? Med Educ. 2012;46(1):89-96.

15. Keller C, Lindblad S, Gäre K, Edenius M. Designing for Complex Innovations in Health Care : Design Theory and Realist Evaluation Combined.

16. Bonell C, Fletcher A, Morton M, Lorenc T, Moore L. Realist randomised controlled trials: A new approach to evaluating complex public health interventions. Soc Sci Med. 2012;75(12):2299-2306.

17. Ellaway RH, O’Gorman L, Strasser R, et al. A critical hybrid realist-outcomes systematic review of relationships between medical education programmes and communities: BEME Guide No. 35. Med Teach. 2016;38(3):229-245.

18. Neville AJ. Problem-based learning and medical education forty years on. A review of its effects on knowledge and clinical performance. Med Princ Pract. 2009;18(1):1-9.

19. Cadogan M, Thoma B, Chan TM, Lin M. Free Open Access Meducation (FOAM): the rise of emergency medicine and critical care blogs and podcasts (2002-2013). Emerg Med J. 2014;31(e1):e76-7.

20. Pawson R, Tilley N. Realistic Evaluation Bloodlines. Am J Eval. 2001;22(3):317-324.

21. Greenhalgh T, Wong G, Westhorp G, Pawson R. Protocol – Realist and meta-narrative evidence synthesis: Evolving Standards (RAMESES). BMC Med Res Methodol. 2011.