EDUCATION THEORY MADE PRACTICAL – Volume 3, Part 6: Programmatic Assessment

As part of the ALiEM Faculty Incubator program, teams of 2-4 incubator participants authored a primer on a key education theory, linking the abstract to practical scenarios. For the third year, these posts are being serialized on our blog, as a joint collaboration with ALiEM. You can view the first e-book here – the second is nearing completion and will soon be released.  You can view all the blog posts from series 1 and 2 here.

The ALiEM team loves hearing your feedback prior to publication. No comment is too big or too small and they will be used to refine each primer prior to the eBook publication.  (note: the blog posts themselves will remain unchanged)

This is the sixth post of Volume 3. You can find the previous posts here: Bolman and Deal’s Four-Frame Model; Validity; Mayer’s Cognitive Theory of Multimedia LearningThe Kirkpatrick Model: Four Levels of Learning Evaluation and Curriculum Development.

————————————————————————————————————————–

Programmatic Assessment

Authors: Elizabeth Dubey; Christian Jones (@jonessurgery); Annahieta Kalantari (@akkalantari)

Editor: Sara M. Krzyzaniak (@SMKrzyz)

Main Authors or Originators: Lambert W. T. Schuwirth, Cees P. M. Van der Vleuten
Other important authors or works: Roger Ellis, Joost Dijkstra

Part 1: The Hook

Sonja excelled in medical school. She flew through her nonclinical years, earned honors on her rotations, and got outstanding letters of recommendation from her mentors.

Intern year was… different.

Sonja struggled to find her way in the busy urban hospital where she trained. Her patient care dedication never lapsed, but she constantly felt she was barely keeping up with the endless flow of test results and pending procedures. Her interactions with senior residents and attending physicians were all business: here’s what needs to be done, let me know when it is complete. This was mirrored in her regular summative evaluations: “Doing fine”, “Keep reading”, and “Hard worker” were the norm.

When her program director (PD) met with Sonja six months into internship, feedback was limited. “I haven’t heard about any major problems,” the PD noted. “How have you been doing?”

The young doctor was reluctant to discuss her uncertainties, and stuck with “Fine, I think.” The two made plans to discuss her progress again after the annual inservice exam.

That, unfortunately, did not go well. Sonja was shocked to find she scored in the 14th percentile on the test and was already expecting the PD’s call. More shock was to come, however; much to her surprise, Sonja was placed on academic probation. She’d be assigned a mentor and a “learning specialist” with whom she was required to meet every other week. If she continued to perform “this poorly”, she was told, she’d be asked to leave the program. The PD asked her if she had considered another specialty.

Part 2: The Meat

Overview

Programmatic assessment utilizes data points from various sources and assessment tools for making high-stakes decisions and to facilitate learning.

We first must make the distinction between the assessment of learning and the assessment for learning. Assessment of learning is the traditional summative assessment which is familiar to all of us. This may take the form of a grade or formal report card. Assessment for learning combines the assessment process with the educational process, allowing education to be tailored to the needs of individual students in an ongoing fashion.1 The goal is to make assessment an integral and more relevant aspect of education. Programmatic assessment considers assessment to be as important as the curriculum itself, thus requiring intense planning and review. This type of an assessment program can allow educators to use the assessment as a teaching tool in itself.

A program of assessment is used to collect and combine information from various assessment sources to inform about the strengths and weaknesses of each individual learner. An important part of innovative assessment programs is that information from all assessment sources can be used to inform each competency domain. The weaknesses or deficiencies of some instruments can be compensated by the strengths of other instruments, leading to a diverse spectrum of complementary measurement tools to understand competence as a whole.

When reviewing a student using programmatic assessment, individual data points, garnered from individual assessments, are maximized for learning and feedback value whereas high-stakes decisions on a learner’s competency are based on the aggregation of many data points. Thus, no high-stakes decisions are made without a detailed collection of information that is supported by thorough measures to ensure their reliability.
Background:

The idea of assessment for learning is not new. It was proposed in 1989 by Martinez & Lipson.1 Their interpretation was limited to increased frequency of standardized testing and the use of more feedback, but their views demonstrated a growing awareness of assessment as an important aspect of education.

Over the 20th century, the behavioristic concept of learning prevailed in education, with the belief that competency was achieved after multiple small steps were mastered. In this traditional construct, the onus is on the learner to pass a module or test. If he fails, he is remediated and retested until he passes. In this way, assessment is viewed as a checklist. This worked well in a mastery learning view of education theory but now shares the spotlight with newer theories. (See also chapters on educational theories).

Modern education builds on constructivist learning theories in which learners create their own knowledge and skills through integrated programs that guide and support competence. This new way of understanding education allows educators to take a fresh look at assessment, and in 2005, van der Vleuten & Schuwirth proposed the notion of programmatic assessment.2

Programmatic assessment uses traditional assessment instruments to augment a more modern approach. In this context, each specific assessment is chosen to combine with others to form a robust program catered to the learner. This helps mitigate limitations in a single assessment as the combination creates an overall thorough assessment program.

Soon after the inception of programmatic assessments, Dijkstra published a set of program design guidelines to help make this idea more practical.3 However, as these guidelines were relatively generic, van der Vleuten proposed an integrated model for programmatic assessment that optimized both the learning function and the decision function in competency-based educational contexts.4 This integrated model is specific to constructivist learning programs.

Modern takes or advances

Programmatic assessment works with the concepts of constructivist learning and longitudinal competence development. There is strong emphasis on using feedback to optimize individual learning. There is also a focus on tailoring remediation to the individual student. For many educational programs, this is a radical change.

The Accreditation Council for Graduate Medical Education (ACGME) uses a set of six core competencies to help define the foundational skills every practicing physician should possess. These include practice-based learning and improvement, patient care and procedural skills, systems-based practice, medical knowledge, interpersonal and communication skills, and professionalism.5 Most residency assessment programs use instruments to evaluate the ACGME competencies with the assumption that the competencies are stable, generic and independent of one another. As a result, assessment innovations have previously been via the development of new instruments to measure one of these constructs.1 This narrow view has many limitations in the potential for assessment to garner valuable information. For example, if in one assessment instrument, competence is acquired by completing a specific set of steps, assessment results must indicate whether or not a step was finished successfully.

Some traditional assessments such as multiple-choice exams are often designed to eliminate information and come to a dichotomous decision. Using the “pass-fail” multiple choice exam as an example, valuable assessment data is discarded along the way; the information about the answers not chosen by the learner, the specific questions that were answered correctly versus those answered incorrectly, and even percentage correct.

Aiming to pass examinations such as this can lead to poor learning habits in students. It can also encourage a “grade culture”, where achieving the highest grade is the main objective.5 Many assessment programs pursue objectivity over subjectivity as it is easier to summarize and compare objective information, but choosing to ignore the details of well-gathered subjective evaluations discards the great value of this subjective information.

Although programmatic assessment is well received in educational practice, many find programmatic assessment complex and overly theoretical, and many training programs have yet to develop this type of assessment program. In response, Van der Vleuten published a paper in 2015 that provides concrete steps to implement programmatic assessment in an educational program.7

1. Develop a master plan
An overarching structure must be chosen, usually in the form of a competency framework. Here, assessments are taken as single data points with the development of a continuum of stakes, ranging from low- to high-stakes decisions. Depending on the curriculum and the phase of study, the master plan will contain a variety of assessments, a mixture of standardized and non-standardized methods, and the inclusion of modular as well as longitudinal assessment elements.
2. Develop examination regulations that promote feedback orientation
Pass-fail decisions should not be made on the basis of individual data points. Avoid connecting credits to individual assessments as this raises their stakes, causing learners to focus on passing a test over quality feedback and follow-up. In all communication, and in examination regulations, the low-stakes nature of individual assessment should be apparent.
3. Adopt a robust system for collecting information
Create electronic student portfolios. This would serve to 1) provide a storehouse for feedback including assessment feedback, activity reports, learning outcome products and reflective reports, 2) facilitate the administrative and logistical aspects of assessment, and 3) allow for quick synopsis of gathered information. This would be easily accessible from anywhere and adjusted to fit the needs of the assessment program.
4. Assure every low-stakes assessment provides meaningful feedback for learning
Meaningful feedback has many forms, including reviewing a multiple-choice test, score reports from standardized tests, skills domains or longitudinal overview for progress test results. Obtain effective feedback from teachers.
5. Provide mentoring to learners
Feedback should be part of a reflective dialogue, and mentoring is an effective way to set this up.
6. Ensure trustworthy decision-making
High-stakes decisions should be entrusted to individuals with good professional judgement and procedural measures should be placed to ensure this.
7. Organize intermediate decision-making assessments
Intermediate (i.e. mid-course) decisions add credibility to high-stakes decisions and keep the learner in the loop on potential future high-stakes decisions. Intermediate assessments are based on fewer data points than final decisions. Their stakes are in between low-stake and high-stake decisions. Intermediate assessments are diagnostic (how is the learner doing?), therapeutic (what should might be done to improve further?), and prognostic (what might happen to the learner if the current development continues to the point of the high-stake decision?)
8. Encourage and facilitate personalized remediation
This should result from an ongoing reflective process and is always personalized. The curriculum should be flexible to accommodate the successful remediation of a struggling learner .
9. Monitor and evaluate the learning effect of the program and adapt
10. Use the assessment process information for curriculum evaluation
Assessment should promote learning as well as determine whether learning outcomes have been achieved, to evaluate the curriculum.
11. Promote continuous interaction between the stakeholders
12. Develop a strategy for implementation
Programmatic assessment requires a culture change and isn’t easy to achieve in an existing educational practice.5 Programs that have implemented a structured assessment program have found it more useful in identifying deficits and helping to create early intervention programs.6,7

Other examples of where this theory might apply in both the classroom & clinical setting

Programmatic assessment depends greatly upon feedback, multiple assessments that are individually low-stakes, and an environment conducive to constructivist learning. This makes programmatic assessment ideal for clinical training; though clinical decisions are much more high-stakes than those on any written examination, thorough supervision and mentoring allow ongoing subjective assessments of trainees while ensuring patient safety.

In addition to the culture shift toward myriad low-stakes subjective evaluations required for programmatic assessment to be successful, it is easy to imagine that intense assessment of students in a trustworthy and fair process requires a great deal of time and effort. Indeed, it is likely that many such programs have failed to meet the burdens of true programmatic assessment for this reason. Programs with many students for a short period of time—for instance, medical student clinical clerkships—therefore may have great difficulty implementing programmatic assessment for their curriculum. Though still possible, this demands significant training of faculty to provide meaningful feedback, dedication of assessors to collect and make sense of learning portfolios, and availability of technology to help more easily combine the individual evaluations into a body of work to provide accurate and reliable assessments.

Annotated Bibliography of Key Papers

Schuwirth LW, Van der Vleuten CP. Programmatic assessment: From assessment of learning to assessment for learning. Med Teach. 2011;33(6):478-485.1

This is the original publication on programmatic assessment. The authors describe the theory and its purpose and potential impact in education.

Van der Vleuten CP, Schuwirth LW, Driessen EW, Govaerts MJ, Heeneman S. 12 Tips for programmatic assessment. Med Teach. 2014:1-6.7

Programmatic assessment-for-learning can be applied to any part of the training continuum, provided that the underlying learning conception is constructivist. This paper provides concrete recommendations for implementation of programmatic assessment.

Ellis R. Programmatic Assessment: A Paradigm Shift in Medical Education. All Ireland Journal of Teaching and Learning in Higher Education (AISHE-J). 2016;8(3).6

The authors explain programmatic assessment in a more understandable way and discuss how it has shifted the relationship between curriculum and assessment.

Perry M, Linn A, Munzer BW, et al. Programmatic Assessment in Emergency Medicine: Implementation of Best Practices. J Grad Med Educ. 2018;10(1):84-90.9

This paper demonstrates the implementation of an assessment program in Emergency Medicine.

Limitations

Building a program of assessment from the ground up is no easy task. It requires multiple active participants with in-depth knowledge of medical education and the evaluation process. It is also essential that the necessary steps are done adequately so that the creation of a sustainable infrastructure can withstand program dynamism.

With such a multifaceted program of assessment, it can be a difficult beast to maintain. Extreme organization and oversight from the program is needed for its success. Much of this responsibility lies on the leading educators, who are likely the PD or APDs in a residency program. Yet given the necessary time commitment, responsibilities will likely have to be delegated, and this is where an unraveling of organization may occur. With good leadership, this is an unlikely outcome but one that should be noted.

A program of assessment depends heavily on the judgement of professionals. Much of this happens at the low-stakes level of individual feedback. These single feedback points are combined qualitatively and direct the higher-stakes decisions of learners. With the fallibility of the educator, a program runs the risk of weighing their assessment heavily on biased feedback. Some of this can be mitigated by training and having multiple data points from which to draw conclusions, but is difficult to completely eradicate.

Part 3: The Denouement

The program considers themselves to have a robust assessment program and are surprised to find that Sonja feels blindsided by her remediation.

They now realize they missed a crucial step in setting up their program of assessment. While they’ve done well managing decisions that are intermediate-stakes (such as whether to put a resident on probation) or high-stakes (like whether to promote a resident to the next year), they have failed to establish a healthy system for low-stakes assessments due to inadequate and infrequent feedback. Without this, the higher-stakes decisions are of little value. Additionally, the program recognizes the lack of an evaluation process for Miller’s pyramid level of ”knows”. Sonja’s probation was the result of one data point from a mastery test. Had she received more frequent low-stakes assessment of her medical knowledge, her poor ITE score may have been avoided.

The program sees the flaws in their feedback system and decides to train their faculty in quality feedback, with the expectation that trainers give on-shift and post-procedure feedback both verbally and through an online evaluation that becomes part of the resident’s e-portfolio. Additionally, the program dedicates resources to develop its faculty in bedside teaching methods such as the Socratic method to allow for more frequent assessments of learner foundation knowledge.

The program director assigns Sonja a mentor. Together, they review the analysis of her ITE scores and reflect on why she scored low. From this they create a personalized and detailed study plan. The mentor frequently “checks in” with Sonja to make sure she is on track and help facilitate her progression. Sonja knows her clinical and academic weaknesses and has a plan for growth. She now feels the program is invested in her and her success.

Don’t miss the 7th post in the series, coming out Tuesday, June 11, 2019!

PLEASE ADD YOUR PEER REVIEW IN THE COMMENTS SECTION BELOW

Reference List

1. Schuwirth LW, Van der Vleuten CP. Programmatic assessment: From assessment of learning to assessment for learning. Med Teach. 2011;33(6):478-485.

2. van der Vleuten CP, Schuwirth LW. Assessing professional competence: from methods to programmes. Med Educ. 2005;39(3):309-317.

3. Dijkstra J, Galbraith R, Hodges BD, et al. Expert validation of fit-for-purpose guidelines for designing programmes of assessment. BMC Med Educ. 2012;12:20.

4. van der Vleuten CP, Schuwirth LW, Driessen EW, et al. A model for programmatic assessment fit for purpose. Med Teach. 2012;34(3):205-214.

5. Holmboe, E.S., Edgar, L., & Hamstra, C.S. The Milestones Guidebook. American Council of Graduate Medicine: 2016.

6. Ellis R. Programmatic Assessment: A Paradigm Shift in Medical Education. All Ireland Journal of Teaching and Learning in Higher Education (AISHE-J). 2016;8(3).

7. van der Vleuten CP, Schuwirth LW, Driessen EW, Govaerts MJ, Heeneman S. 12 Tips for programmatic assessment. Med Teach. 2014:1-6.

8. Hauff SR, Hopson LR, Losman E, et al. Programmatic assessment of level 1 milestones in incoming interns. Acad Emerg Med. 2014;21(6):694-698.

9. Perry M, Linn A, Munzer BW, et al. Programmatic Assessment in Emergency Medicine: Implementation of Best Practices. J Grad Med Educ. 2018;10(1):84-90.