EDUCATION THEORY MADE PRACTICAL -Volume 3, Part 3: Mayer’s Cognitive Theory of Multimedia Learning

As part of the ALiEM Faculty Incubator program, teams of 2-4 incubator participants authored a primer on a key education theory, linking the abstract to practical scenarios. For the third year, these posts are being serialized on our blog, as a joint collaboration with ALiEM. You can view the first e-book here – the second is nearing completion and will soon be released.  You can view all the blog posts from series 1 and 2 here.

The ALiEM team loves hearing your feedback prior to publication. No comment is too big or too small and they will be used to refine each primer prior to the eBook publication.  (note: the blog posts themselves will remain unchanged)

This is the third post of Volume 3. You can find the previous posts here: Bolman and Deal’s Four-Frame Model and Validity.


Mayer’s Cognitive Theory of Multimedia Learning

Authors:  Laurie Mazurik; Elissa Moore (@ElissaMoore3); Megan Stobart-Gallagher;Quinn Wicks

Editor:  Daniel W. Robinson

Main Authors or Originators:  Richard E. Mayer, Roxana Moreno

Other important authors or works: Ruth Colvin Clark

Part 1:  The Hook

Dr. Assistant Professor just had his annual end of year review with the department chairman. He has received good evaluations from the majority of the residents complimenting his on shift and bedside teaching methods. However, his evaluations for his module and grand rounds lectures are below average when compared to his peers. The residents have commented that he often reads from his slides; that his Power Points are hard to read with too many words; and they do not get much out of his presentations. His co-faculty have also commented that his PowerPoint presentations often lack images or graphics. His chairman recommends several FOAM sites and blogs for advice on how create more engaging presentations, but Dr. Assistant Professor is still struggling to better incorporate multimedia into his talks. He just does not understand what he is reading.

What does it mean to use more images?

What words should I use?

How will the group get the core content if it is not written down on the slides?

How will they retain anything from a presentation full of images?

Part 2:  The Meat


Mayer’s cognitive theory on multimedia learning was developed to foster meaningful learning or a deeper understanding of material presented. This theory is based on the premise that students learn more from pictures and words (either spoken or printed), than words alone due to the way the brain processes information. This theory is built on three core principles:1,2

  1. Dual Channel Principle: Our brain processes information across two channels depending on how information is presented: with auditory or visual stimulation.
  2. Limited Capacity Principle: The brain can be overwhelmed easily as no one person has infinite space and/or memory – so do not overwhelm learners with information. Our brain will choose what to pay attention to.
  3. Active Processing Principle: In order for learning to occur, our brain must convert information from sensory memory to working memory, processing presented information by creating mental models of information as it is presented.

This theory also describes the concept of potential areas to avoid that will create “cognitive overload,” or when processing overwhelms the learner’s capacity. This can significantly impede a learner’s ability to achieve meaningful learning, retention, and ability to solve future problems based on what they were taught (transfer).1


As discussed above, the original premise of this work was to help multimedia learning to accomplish meaningful learning. Multimedia learning is defined as learning from words and pictures, while multimedia instruction is presenting words and pictures in order to teach. Meaningful learning is defined as achieving a deep understanding of material, while organizing it into a structure one can understand and then going forward and integrating it into their already present core knowledge. Studies beyond Mayer’s original work, show that long-term transfer and long-term retention occur more frequently when adhering to this theory in a medical curriculum.2 To understand how multimedia instruction can be a useful teaching modality, one must understand how the mind works first.1

As we revisit the three principles of Mayer’s theory: dual channel assumption, limited capacity assumption, and the active processing principle we must first review how the brain processes information. As described below in Figure 1, words and/or images are conveyed via the dual channel assumption (visual or auditory). They are then taken into sensory memory, where the brain selects things to first process using their working memory and then finally create a mental framework or model of the information via active processing. This can then be integrated with prior knowledge, applied to new concepts by testing retention and become a long-term memory. Within this process are many areas where cognitive overload could occur and since we all have limited capacity, this can be a delicate balance of how many images/words, etc. to include when creating a multimedia presentation. Cognitive overload in the form of interesting extraneous details not relevant to the core material have shown decreased processing during learning.3

Congnitive theory of multimedia learning
Figure 1: Cognitive theory of multimedia learning 1

In order to create effective instruction, one must take time to figure out what needs to be learned and to what level of application, as well as how learning will be measured: essentially the HOW of teaching. First, the creation of instructional objectives or the WHY of the lesson, followed by the WHAT including the meat of the subject matter to be taught, followed by the HOW DID WE DO by measuring retention.

When developing instructional design, consider the above principles of how one learns, how much they can learn/process as well as trying to avoid cognitive overload. This can be accomplished by breaking down these further into:1,,4,5,6

  1. Ways to avoid extraneous processing
    • Take out extraneous material, although it may be entertaining (Coherence principle)
    • Highlight essential core concepts (Signaling principle)
    • If using words, put them near their visual counterparts (Spatial continuity)
    • If using words, show them at the same time as their visual counterparts (Temporal continuity)
  2. Ways to manage essential processing
    • Pre-train with key concepts
    • Break down into smaller segments controlled by learner (Segmental principle)
    • Presents words in spoken form instead of written form
  3. Ways to foster generative processing allowing for learner to organize material
    • Words + Pictures > Words OR Pictures alone, but you do not need words, pictures, AND written text.
    • Conversational tone > Formal tone (ex. Use YOUR instead of THE when describing a body system)
    • Human voice > computerized voice
    • Image principle – do not include your own image!

The bolded statement above is the core behind the Multimedia principle of instructional design:  learning from words and pictures together show higher retention and transfer of material when compared to words or pictures alone.1,2,4,5,6

Modern takes or advances

In a simple Google search, there are endless websites, blogs, podcasts, books, etc. on how to create a stellar presentation. You can just think about your favorite TED talks, or any Free Open Access Medical Education (FOAMed) resources and see how the multimedia theories have been applied (or not applied).  Some of the most prolific applications of this occur in formal presentation series such as Keynotable hosted by Haney Mallemat, MD or by reading the “Presentation Zen” series by Garr Reynolds. You can also see this published beyond medical education literature in business resources like the Harvard Business review where they highlight the P-cubed, or the three P’s of presenting as an off-shoot of parts of multimedia theory.

Where this is seen very commonly in Emergency Medicine education right now is within segmented video resources coupled with podcasts/verbal discussions. Hippo EM is an excellent example of this – where simple video images are played while verbal discussion takes place. At the end of each video series, there are questions to test retention and, hopefully, as you go forward through the videos coupled by body system, you can build on previous knowledge.

Other examples of where this theory might apply in both the classroom & clinical setting

In medical education, this obviously is a theory in which classroom learning can be built upon, but not only that: using a flipped classroom approach to present materials in a multimedia module (presenting key concepts) ahead of time and then using this approach for more detailed in person lectures verbalizing most of the information with highlighted images can reaffirm those key concepts. This can be stretched from undergraduate medical education to graduate medical education and beyond for faculty development.

In the clinical arena, patients benefit tremendously from visual stimuli when understanding disease processes or plans. We do not always remember to use appropriate layman terms despite literature instructing us to write discharge summaries at a 6th grade reading level.7 If we begin using this modality in the clinical arena, with utilization of white boards at the bedside, iPad and digital recorded instructions with representative drawings, we may be able to improve both patient satisfaction and potentially long term outcomes by having better understanding.

Annotated Bibliography of Key Papers

Mayer R. Applying the science of learning to medical education. Medical Education. 2010;44:543-549.4

This article mimics much of the original theory, but specifically launches into a more easily understandable principles of reducing extraneous processing, managing essential processing, and fostering generative processing through some specific medical education examples. Prior to that, it focuses on creating instructional objectives, which should serve as the basis and framework for the creation of your educational content.

Huang C. Designing high-quality interactive multimedia learning modules. Computerized Medical Imaging and Graphics 29 (2005); 223-233.8

This really speaks to a generational approach of incorporating technology into education and as we see a trend of students going on-line for more easily digestible information (i.e. FOAMed), it is vital to know how this content can be created using researched methods. This paper describes best practice guidelines for creational of educational multimedia design from concept to reality in a step-wise fashion: 1) Understanding problem and needs while creating goals; 2) Designing content; 3) Build interactivity into module for self use; 4) Test and evaluate; and, 5) Take feedback and redesign.  While not all educators will be building an interactive modular curriculum, the basics of design, creation, and assessment with redesign cannot go ignored.

Clark R. Mayer R. E-Learning and the science of instruction: Proven guidelines for consumers and designers of multimedia learning. 2011. Wiley and Sons.9

While this last suggestion is a text instead of a key paper, it has several chapters dedicated specifically to teaching an instructor on how to create electronic educational resources including chapters on multimedia learning focusing on using words and graphics in lieu of either alone. It also hosts several chapters focusing on additional principles including that of personalization, redundancy, coherence, contiguity principles, as well as segmenting lessons, etc. While specifically focusing on multimedia learning, the 23 pages take the reader on a journey through the background material as well as recommendations on how to illustrate specific content types including topic maps to help create organization and prompt the creation of mental mapping for learner processing.


The theory of multimedia learning itself is very basic and seems like an easy concept for teaching: Words + Images are better than either alone. However, the limitation can lie in the attempted execution of this theory. Creation of instructional objectives could be an entire course upon itself, so creating objectives and then translating them into a useful multimedia presentation to create meaningful learning requires time, patience, and a lot of practice from module creators for both creation of material as well as execution if done in a live setting.

Another potential limitation is that all learners have the potential for varying abilities to process as well as different thresholds for cognitive overload. Although a lot of how we learn has been debunked, as primarily auditory vs. tactile, etc., using the dual channel process may work really well for some and not well for others based on how they have adapted their processing over time. With adult learners, you may have some stuck in old ways and their brains will have be rewired to process, learn and retain in this format. In regards to overload, a lot of conversation may actual overload one learner while polite conversational instruction may cause others to completely tune out. It may be a challenge to use a blanket modality to instruct a large group of learners.

Part 3:  The Denouement

Dr. Assistant Professor, having gained a better understanding of the dual channel principle, has learned that his learners would benefit the most if he added pictures to his lectures, thus facilitating the construction of the mental framework needed to not only keep his learners engaged, but the retention of meaningful concepts as well. By adding images to the spoken word of his lecture, he is able to recruit an entire additional sensory modality and its subsequent neural power to boost the memory formation of his learners. He now knows that his learners retain concepts better with a combination of words and pictures. By replacing the written walls of text on his previous lectures and replacing them with imagery, he is able to better engage his learners with his spoken word as their mental energies will be spent absorbing his spoken word instead of reading words on the screen. Through his understanding of the Limited Capacity Principle, he knows that large amounts of text are far more likely to overwhelm his learners than help them follow along with his lecture.

While he has reduced the amount of text on his lectures significantly, he has elected to use words on his slides during particular parts of his lecture in small doses. He takes care to ensure the words on the screen line up with the timings of his spoken word, as per the Temporal Principle and to make sure they align spatially with relevant images. He takes care to speak plainly and to remove extraneous details from his lectures to reduce cognitive overload.

While initially he is concerned that his learning group will not understand the core content because it is no longer plainly written on his lecture slides, his fears are put to rest when the strong combination of visual and auditory learning modalities, combined with a reduction in extraneous text and other details, result in greater conceptualization for more learners with improved retention.