(From the E-i-C: This is our new series on methodology edited by Lynfa Stroud. Every month for the next year we are going to review a methodology you frequently encounter as a CE in the literature. If there is a topic that you really want to see, see us an email at email@example.com and if there is a resource that you think adds to the post, please include in the comments section -Jonathan)
Which is better for learning: PBL or lecture? An e-module or the flipped classroom? Blog posts or text books? Our curriculum or theirs? The question of ‘better’ is something teachers, researchers, and deans commonly face. The intervention that leads to superior learning is obviously a good thing for the students. So how do we determine superior? A simple and intuitive answer is to pit the two (or more) interventions in a head to head experimental test, i.e. a classic randomized controlled trial (RCT), A vs. B.
The classic RCT from clinical medicine that most people are familiar with (e.g. a drug vs. placebo) is a simple superiority trial intended to show the efficacy of one intervention over another. At the very heart are the features of randomization of individuals and comparison of outcomes between two or more groups.
While this design is necessary as a first step it is also limited in the type of information it provides. Superiority of an intervention is often demonstrated in the ‘best’ possible conditions: motivated teachers, strict inclusion/exclusion criteria of participants, close involvement of experts etc. Efficacy – performance under the best conditions – is no guarantee of effectiveness: performance under regular or real world conditions. For education, real world conditions mean implementation with teachers of varying skill or comfort level (not to mention multiple clinical and administrative responsibilities), uneven educational resources, time constraints, and learners with increasingly busy schedules. For a small school this could mean variation in effectiveness across classrooms. For a large residency program, it could mean variation across multiple training sites and locations. Beyond these complexities, researchers and decision makers must also ask questions that go beyond just the educational outcomes. Time, cost, resources, learner and faculty satisfaction are also factors that will affect the sustainability of any educational approach. These complexities don’t mean we can’t conduct RCTs. Instead, we might have to think of creative ways in which to play with RCT designs to get answers within our own practical and day-to-day educational lives.
Two variations of RCTs worth considering are cluster-randomized trials and stepped-wedge designs. In cluster trials, instead of randomizing individuals, groups of individuals defined by a common feature (e.g. same teacher, same site, same background) are randomized. These trials are commonly used in clinical work and in research in K-12 education. Cluster randomization helps increase sample size and helps a researcher capture variation in education contexts. In this design, some clusters will receive one intervention while other clusters will not. In the context of an actual curriculum, denying an intervention to one group may not always be ethical or possible. Stepped-wedge designs also randomize clusters but instead of randomizing whether an intervention will be given, the researcher randomizes when an intervention is given. For example, if the intervention is transition to a flipped classroom, each classroom or site participating in the study would make the transition at a different time point. But after each classroom adopts the intervention, all of them are evaluated for an outcome regardless of whether they have made the transition. Thus the addition of each classroom to the intervention enables continued study of the effect and continual refinement of the process of implementing the intervention (e.g. faculty and student development, educational resourcing etc.). And by the end of the study, all classrooms would have received the intervention. In many circumstances where innovations to curricula are being piloted at one or two sites, a stepped wedge RCT can be conducted as a natural experiment. This allows the researchers and policy makers to study the intervention across the range of relevant real world contexts as well as generate knowledge about implementation. Coupled with collection of data on a larger number of outcomes, a stepped-wedge can be a useful way to collect high quality evidence within the complex and messy world of education.
Of course, these designs are better suited for large-scale evaluations of interventions that have already been tested for efficacy. In more resource limited settings or for smaller scale implementations, other RCT designs can be tried. Cross-over designs are one possible approach (and work best for investigating new delivery approach or methods of instruction as opposed to a new body of knowledge). In these designs all participants get a chance to be exposed to both interventions. For example, initially Group A may learn a skill through a flipped classroom and Group B may learn the same skill through a traditional approach. After an outcome evaluation, Group B may learn another skill through the flipped classroom and Group A may learn it through the traditional approach. The advantage of this is a test of the intervention across two skill domains and all participants get a chance to be exposed to the intervention. Unlike in pharmacological research, there can be no ‘washout’ period for education; thus researchers require multiple content or domain areas for a true cross-over and the study will take some more time to conduct.
These designs come with their own challenges and special techniques. For one, the statistical analysis is no longer straightforward as is the calculation of sample size. Secondly, as with all RCTs, they require careful planning and resource investment; thus they are best conducted when there are true concerns about the potential success of a new approach and when evidence on implementation is weak. Reading widely about best practices and getting methodological consults are likely a good first step before embarking on an RCT whether it is simple parallel design, cluster, or cross-over trial.
Key Learning Points
- Educational RCTs often generate evidence under the best possible conditions for efficacy; be wary of immediately generalizing their findings and consider things like similarity of contexts, population, and teacher expertise.
- Cluster and stepped-wedge RCT designs can be implemented in more practical settings when there are sufficient resources; they can sample a variety of contexts relevant to everyday education.
- Cross-over designs are best suited for studying changes in the way education is delivered or provided and more feasible for low-resource settings.
- Tolsgaard MG, Kulasegaram KM, Ringsted C. Practical trials in medical education: linking theory, practice and decision making. Med Educ. 2017;51(1):22-30. This conceptual paper lays out the limitations of current approaches to generating ‘practical evidence’ and suggests some ways of thinking – as well as design considerations beyond just the RCT method – to collect practical and actionable evidence in medical education.
- Bilimoria KY, Chung JW, Hedges LVet al. National cluster-randomized trial of duty-hour flexibility in surgical training. N Engl J Med2016;374 (8):713–27. This study is a wonderful example of a large scale cluster trial and provided some interesting – and counter-intuitive findings – on a wide variety of outcomes. Of course, not a trials can be this large (the n is over 4000!) but some of the issues in design, analysis, and considerations for accounting for cluster variation are relevant.