Creating multimodal texts

Multimodal texts combine two or more modes such as written language, spoken language, visual (still and moving image), audio, gestural, and spatial meaning (The New London Group, 2000; Cope and Kalantzis, 2009). Creating digital multimodal texts involves use of communication technologies, however, multimodal texts can also be paper based, or live performances.

The Victorian Curriculum recognises that students need to be able to create a range of increasingly complex and sophisticated spoken, written, and multimodal texts for different purposes and audiences, with accuracy, fluency and purpose.

Why teaching creating multimodal texts is important

Creating multimodal texts is an increasingly common practice in contemporary classrooms. Easy to produce multimodal texts include posters, storyboards, oral presentations, picture books, brochures, slide shows (PowerPoint), blogs, and podcasts. More complex digital multimodal text productions include web pages, digital stories, interactive stories, animation, and film.

Student authors need to be able to effectively create multimodal texts for different purposes and audiences, with accuracy, fluency, and imagination. To do this, students need to know how meaning is conveyed through the various modes used in the text, as well as how multiple modes work together in different ways to convey the story or the information to be communicated.

Students need to know how to creatively and purposefully choose how different modes might convey particular meaning at different times in their texts, and how to manipulate the various combinations of different modes across the whole text to best tell their story (Jewitt, 2009). See: Modes.

Examples of texts to create

Below are examples of different forms of texts students might create in the classroom. The complexity of creating texts increases proportionately with the number of modes involved and the relationships between the various semiotic, or meaning making, systems in a text, as well as use of more complicated digital technologies.

Simple multimodal texts include comics/graphic novels, picture books, newspapers, brochures, print advertisements, posters, storyboards, digital slide presentations (e.g. PowerPoint), e-posters, e-books, and social media.

Meaning is conveyed to the reader through varying combinations of written language, visual, gestural, and spatial modes.

Podcasts are also simple to produce, involving combinations of spoken language, and audio modes.

Live multimodal texts include dance, performance, oral storytelling, and presentations. Meaning is conveyed through combinations of various modes such as gestural, spatial, audio, and oral language.

Complex digital multimodal texts include: live action films, animations, digital stories, web pages, book trailers, documentaries, music videos. Meaning is conveyed through dynamic combinations of various modes across written and spoken language, visual (still and moving image), audio, gesture (acting), and spatial semiotic resources. Producing these texts also requires skills with more sophisticated digital communication technologies.

What teachers and students need to know

Skilled multimodal composition requires students to have knowledge of the subject or field of the text, textual knowledge of how to best convey meaning through the text; digital multimodal authoring also requires knowledge of the technology and of the processes required to produce innovative digital media productions (Mills, 2010).

Textual knowledge encompasses both semiotic knowledge and genre. Semiotic knowledge concerns how each mode conveys meaning in different ways in the text, where each mode has its own specific task and function (Kress, 2010, p. 28) in the meaning making process.

Multimodal authors also need to be able to imaginatively combine different modes in various strategic arrangements throughout the text, for example print and visual semiotic resources in a picture book, to effectively and creatively convey the meaning required.

Genre concerns knowledge of the social functions and contexts in which a text is produced and used, and how the text is organised and staged to meet a specific social purpose (Martin, 2008). Like writing, successful multimodal composition includes consideration of purpose, audience and text type (for example, to entertain, inform, or persuade). Technological knowledge concerns knowledge of the technical content as well as of the processes required to produce innovative digital media productions, including knowledge of the machines involved and the media applications (Mills, 2010, p. 224).

Effectively teaching students how to create multimodal texts requires new and diverse literacy skills and semiotic knowledge which, by necessity, extend beyond the realms of traditional print-based literacy into other learning disciplines. Literacy teachers need to draw on expertise and knowledge and skills from other disciplines, to support the development of new literacy competencies. This includes essential aspects from The Arts – music, media, drama, film, and art; and from Information Communication Technologies (ICT).

Teaching creating multimodal texts: production stages

Teaching creating multimodal texts is based on teaching writing, extended to teaching students how to produce short, purposeful, and engaging texts in different forms and media formats.

Students need to develop increasing control over the different semiotic contributions of each of the modes deployed, and at the same time, attend to creatively combining modes into a meaningful whole (Hull, 2005, p.234). In addition, pedagogic attention to any technological requirements is also essential.

Teaching creating multimodal texts can be structured in stages around the film production approach. This includes pre-production, production, and post-production.


The pre-production stage includes consideration of the topic, the purpose, the audience and the context. The story/content is drafted and organised, and manageable boundaries are established. This includes setting limits to number of pages in a picture book, or slides in a PowerPoint, or time limits for digital productions – 30 to 90 seconds is long enough for novice podcasts, film or animation productions.

The production process is planned. This might include writing a story outline which provides brief information about who, what, where, and when; a script which includes information about the text participants (characters or subjects), dialogue, action, sound effects, and music; and preparing a storyboard to scope the visual design of the text – what is to be shown and how it will be seen. (See Visual metalanguage for more information.)


Image 1: Storyboard example​. (Creative Commons BY-ND 4.0)

The production stage

The production stage is where the text is composed or produced. Production can be a simple process using familiar tools and resources or can involve learning to use more complex digital tools including cameras, recording equipment, or digital applications and software.

Complex media production processes can be simplified for the literacy classroom. For example, a simplified approach to creating live action films involves an ‘in-camera’ edit. This requires the whole sequence to be carefully planned first. Beginning with the title shot, the film is shot in sequence, shot by shot, pausing the camera between shots. Sound effects and additional information must be recorded at the same time as the action. Following the final shot, the film is finished, and there is no further editing or post-production. The same approach can be used recording simple podcasts, as an ‘in-microphone’ edit.

In contrast, a conventional approach to filmmaking/podcast production involves filming or recording the content in segments first, and then putting the final text together through post production.

Post-production stage

In the post-production stage filmed shots or recorded audio segments, are edited using a digital editing program to remove sections, to order information, and to add in introductions, titles, music, visual and sound effects.

Using the teaching and learning cycle for creating multimodal texts

The teaching and learning cycle (TLC) initially developed for teaching writing and reading provides a logical, systematic process for teaching creating multimodal texts (Zammit, 2015; 2014; Chandler, O’Brien and Unsworth, 2010).

This approach supports teaching students how to successfully create a range of different texts for different purposes and audiences, which clearly communicate the author’s meaning (Miller, 2010, p.214) through attention to meaning design in the different modes deployed.

The teaching and learning cycle focuses on the cyclical nature of the teacher’s role though the various production stages. It includes teacher modelling, and explicit teaching of relevant semiotic knowledge and the metalanguage of meaning making in different modes, as well as required skills for effective use of any technology used. Textual knowledge, both semiotic and genre, as well as technological knowledge required need to be explicitly stated and incrementally taught (Christie and Macken-Horarik, 2007). Competent digital authoring requires coherent and systemic levels of pedagogical attention and support, in the same ways that writing is taught and valued in schools (Burn, 2006).

The TLC involves four key stages which incorporate social support for creating multimodal texts through varied interactional routines (whole group, small group, pair, individual) to scaffold students’ learning about meaning making in a variety of modes, and texts.

These stages are:

  • Building the context or field – understanding the purpose of the text and the context (genre) and building shared understanding of the topic
  • Modelling the text (or deconstruction) – the use of mentor or model texts to focus explicitly on the structure of the text, identify the modes used and the different semiotic resources used in each mode, examples of meaning design choices made in different modes, how modes work independently and together to shape meaning, and to build a metalanguage
  • Guided practice (or joint construction) – teachers and students jointly constructing a text
  • Independent construction – students’ independent composing of a new text. (Derewianka and Jones, 2016; Humphrey, 2017; Humphrey and Feez, 2016)

Mentor or model texts need to be carefully selected by the teacher to support the students to work within their ‘zone of proximal development’ (Vygotsky 1978) in developing their knowledge of how meaning is conveyed in different modes in different texts. Dependent on the year level, the selected text and the teaching focus, whole texts or text extracts can be used. See visual metalanguage for examples of visual semiotic resources, and the teaching and learning cycle for further guidance.

Resources to support creating digital multimodal texts

Australian Centre for the Moving Image (ACMI): Film it - The filmmakers tool kit

Australian Children’s Television Foundation (ACTF): Little lunch app (Years 4-9); MY:24 App (Years 4-9)

Creating multimodal texts

Education Department of Victoria, FUSE: search for Filmmaking 101

Education Services, Australia (ESA): Scootle (Search by key words such as ‘create’, ‘filmmaking’, “comic’, ‘digital story’, ‘poster’, ‘blog’, ‘webpage’, ‘advertisement’; ‘design’. Refine search by year level, and subject area: English.)


Burn, A., and Durran, J. (2006). Digital anatomies: analysis as production in media education. In D. Buckingham and R. Willett (Eds.), Digital Generations Children, young people, and new media. (pp. 273-293). New York, London: Lawrence Erlbaum Associates.

Chandler, P. D., O'Brien, A., and Unsworth, L. (2010). Towards a 3D multimodal curriculum for upper primary school. Australian Educational Computing, 25(1), 34-40.

Christie, F. and Macken-Horarik, M. (2007). Building verticality in subject English, In F. Christie, J.M. Martin. Language, knowledge and pedagogy: functional linguistic and sociological perspectives. London; New York: Continuum. 156-83.

Cope, B., and Kalantzis, M. (2009). A grammar of multimodality. The International Journal of Learning, 16(2), 361-423.

Hull, G. (2005) Locating the Semiotic Power of Multimodality, Written Communication, 22(2), 224-261.

Jewitt, C. (ed.) (2009). The Routledge Handbook of Multimodal Analysis, London: Routledge.

Kress, G. (2010). Multimodality: a social semiotic approach to contemporary communication. London; New York: Routledge.

Martin, J. R., and Rose, D. (2008). Genre relations: mapping culture. London; Oakville, CT: Equinox Pub.

Miller, Suzanne M. (2010). Towards a multimodal literacy pedagogy: Digital video composing as 21st century literacy. In P. Albers. Literacies, Art, and Multimodality. Urbana-Champaign, Illinois: National Council of Teachers of English. 254-281.

Mills, K. A. (2010). What Learners "Know" through Digital Media Production: Learning by Design. E-Learning and Digital Media, 7(3), 223-236.

The New London Group. (2000). A pedagogy of Multiliteracies designing social futures. In B. Cope and M. Kalantzis (Eds.), Multiliteracies: Literacy Learning and the Design of Social Futures (pp. 9-38). South Yarra: MacMillan.

Zammit, K. (2015). Extending Students’ Semiotic Understandings: Learning About and Creating Multimodal Texts. In P. P. Trifonas (Ed.), International Handbook of Semiotics (pp. 1291-1308). New York, London: Springer.

Zammit, K. (2014). Creating Multimodal Texts in the Classroom: Shifting Teaching Practices, Influencing Student Outcomes. In R. E. Ferdig and K. E. Pytash (Eds.), Exploring Multimodal Composition and Digital Writing (pp. 20-35). Hershey PA: IGI Global.