Overview of multimodal literacy

A multimodal text conveys meaning th​​rough a combination of two or more modes, for example, a poster conveys meaning through a combination of written language, still image, and spatial design. Each mode has its own specific task and function (Kress, 2010, p. 28) in the meaning making process, and usually carries only a part of the message in a multimodal text. In a picture book, the print and the image both contribute to the overall telling of the story but do so in different ways.

Images may simply illustrate or expand on the written story, or can be used to tell different aspects of the story, even contradicting the written words (Guijarro and Sanz, 2009, p. 107).

Effective multimodal authors creatively integrate modes in various configurations to coherently convey the meaning required, ‘moving the emphasis backwards and forwards between the various modes' (Cope and Kalantzis, 2009. p. 423) throughout the text.

The complexity of the relationships between the various meaning or semiotic systems in a text increases proportionately with the number of modes involved. For example, a film text is a more complex multimodal text ​​than a poster as it dynamically combines the semiotic systems of moving image, audio, spoken language, written language, space, and gesture (acting) to convey meaning. 


The foll​​owing overview of how meaning can be composed through different semiotic resources for each mode (spoken language, written language, visual, audio, gestural, and spatial) is informed by The New London Group (2000), Cope and Kalantzis, (2009), and Kalantzis, Cope, Chan, and Dalley-Trim (2016).

Currently, there is extensive pedagogic support for teaching meaning making through spoken and written language, and some resources developed to support teaching meaning making in the visual mode, through ‘viewing’. However, as yet there are few resources available for teaching young students how to comprehend and compose meaning in the other modes. 

Written​​ meaning

Conveyed through written language via handwriting, the printed page, and the screen. Choices of words, phrases, and sentences are organised through linguistic grammar convention​​​​s, register (where language is varied according to context), and genre (knowledge of how a text type is organised and staged to meet a specific purpose). See: Writing and Reading and Viewing 

Spoken (​​oral) meaning

Conveyed through spoken language via live or recorded speech and can be monologic or ​​dialogic. Choice of words, phrases, and sentences are organised through linguistic grammar conventions, register, and genre. Composing oral meaning includes choices around mood, emotion, emphasis, fluency, speed, volume, tempo, pitch, rhythm, pronunciation, intonation, and dialect. See: Speaking and listening pedagogic resources​.

Visual ​​meanin​​g

Conveyed through choices of visual resources and includes both still image and moving images. Visual resources include: framing, vectors, symbols, perspective, gaze, point of view, colour, texture, line, shape, casting, saliency, distance, angles, form, power, involvement/detachment, contrast, lighting, naturalistic/non-naturalistic, camera movement, and subject movement. See Visual literacy metalanguage.

Audio me​​aning

Conveyed through sound, including choices of music, ambient sounds, noises, alerts, sil​​ence, natural/unnatural sounds, and use of volume, beat, tempo, pitch, and rhythm.

Spatial meanin​​g

Conveyed through design of spaces, using choices of spatial resources including: scale, proximity, boundaries, direction, layout, and organisation of objects in the space. Space ​extends from design of the page in a book, a page in a graphic novel or comic, a webpage on the screen, framing of shots in moving image, to the design of a room, architecture, streetscapes, and landscapes.

Gestural meaning

Conveyed through choices of body movement; facial expression, eye movements​ and gaze, demeanour, gait, dance, acting, action sequences. It also includes use of rhythm, speed, stillness and angles, including ‘timing, frequency, ceremony and ritual’ (Cope and Kalantzis, 2009. p. 362). 

Types of multim​​odal texts

Multimodality do​​es not necessarily mean use of technology, and multimodal texts can be paper-based, live, or digital. 

Paper-​​based multimodal texts include picture books, text books, graphic novels, comics, and posters.

Live ​​multimodal texts, for example, dance, performance, and oral storytelling, convey meaning through combinations of various modes such as gestural, spatial, audio, and oral language. 

Digital multimodal texts include film, animation, slide shows, e-posters, digital stories, podcasts, and we​​b pages.

Why teac​​hing multimodal literacy is important

Effecti​​ve contemporary communication requires young people to be able to comprehend, respond to, and compose meaning through multimodal texts in diverse forms.

To do this, students need to know how each mode uses unique semiotic resources to convey meaning (Kress, 2010) and this needs to be taught explicitly. In a visual text, for example, representation of people, objects, and places can be conveyed using choices of visual semiotic resources such as line, shape, size, line and symbols, while written language would convey this meaning through sentences using noun groups and adjectives (Callow, 2013) written or typed on paper or a screen. 

Students also need to be taught how authors juggle the different modes to determine the most apt way t​​o tell their story, and how meaning in a multimodal text is ‘orchestrated’ through the selection and use of different modes in various combinations (Jewitt, 2009. p.15).

Modes and mean​ing making: three sub-strands

Students need to understand how authors can control and use the unique semiotic resources available in​​​ each different mode used in a multimodal text. Currently, the Victorian Curriculum organises teaching about language around three types of meaning organised as sub-stands: Expressing and developing ideas; Language for interaction; and Text structure and organisation. Similarly, teaching meaning making in other modes can be approached through three sub-strands.

Expressing and developing i​​deas

What is happening in the text? Students learn how the different m​​​​eaning making resources can be used to: construct the nature of the events, the objects and participants involved, and the setting and circumstances in which they occur – who, what, where and when, and to express actions and ideas.

Interactin​​g and relating with ​​others

How do we interact with and relate to others? How do we feel? Students learn how design of interactive meaning in a multimodal text includes consideration of the social setting, how interactions between the viewer/reader/listener and the subject can be established, and how to build and maintain relationships. Students need to learn how to express knowledge, skills, feelings, attitudes and opinions, credibility, and power through different modes. 

Text structure and ​​organisation

How do design and layout build meaning and guide the reader/viewer/listener through the text? Students learn how different modes are used to structure ​​a text in a particular way to create cohesive and coherent texts, with varying levels of complexity. For example, students learn how the image maker guides the viewer through the text through the deliberate choices of visual design at the level of the whole text, and components within the text. In examining how the image or text is organised, students learn how visual design choices can prioritise some meanings and background others (Painter, Martin, & Unsworth, 2013). 

(For fu​​rther information, see Anstey and Bull, 2009; Callow, 2013; Cloonan, 2011, Kalantzis, Cope, Chan, and Dalley-Trim, 2016.) 


Anstey, M., & B​​ull, G. (2009). Using multimodal texts and digital resources in a multiliterate classroom. In e:lit (Vol. 004, pp. 1-8). Sydney: Primary English Teaching Association.

Callow, J. (20​​​13). The Shape of Text to Come: How Image and Text Work. Sydney: Primary English Teaching Association of Australia.

Cloonan, A. (2011). Creating multimodal metalanguage with teachers. English Teaching, 10(4), 23.

Cope​​, B., & Kalantzis, M. (2009). A grammar of multimodality. The International Journal of Learning, 16(2), 361-423.

Guijarro, J​​. M., & Sanz, M.J. (2009) On interaction of image and verbal text in a picture book. A Multimodal and Systemic Functional Study. In E. Ventola & J. M Guijarro (Eds), The World Told and the World Shown: Multisemiotic Issues (pp. 107-123). Palgrave Macmillan.

Jewitt, C. (ed.) (2009) The Routledge Handbook of Multimodal Analysis, London: Routledge.

Kala​​ntzis, M., Cope, B., Chan, E., & Dalley-Trim, L. (2016). Literacies (2nd ed.). Port Melbourne, VIC, Austalia: Cambridge University Press.

Kress, G. (2010). Multimodality: a social semiotic approach to contemporary communication. London; New York: Routl​​edge.

Painter, C., Martin, J. R., & Unsworth, L. (2013). Reading Visual Narratives: image analysis of children​​​'s picture books: Equinox Publishing Limited 

The New London G​​roup. (2000). A pedagogy of Multiliteracies designing social futures. In B. Cope & M. Kalantzis (Eds.), Multiliteracies: Literacy Learning and the Design of Social Futures (pp. 9-38). South Yarra: MacMillan.