What is Generative AI?

Troy Heaps

1 What is Generative AI?

What is Generative AI? What are some key terms to know?

This chapter explores:

the recent proliferation of new AI tools, and some ways they have impacted education;
the origins and evolution of artificial intelligence;
the training methods and inner workings of text-, image-, video-, and audio-producing AI tools.

Since the public release of ChatGPT 3 in November 2022, the use of powerful generative AI tools has become a part of everyday life for hundreds of millions of people. The most recent versions of generative AI tools can produce a wide range of content including text (essays, dialogue, correspondence, code), images, video, and audio.

The increased prevalence of AI tools has had positive and negative impacts on education. After the release of ChatGPT, many educators initially reacted with concern over its potential impact on academic integrity, as they noticed AI tools were able to generate passable content for assignments, essays, and exam questions. Over the past few years, as educators have incorporated AI tools into their courses—creating engaging simulations, providing students with additional formative feedback, and brainstorming lesson plans—sentiment has shifted and many now recognize the benefits of using AI in their teaching practice.

This chapter aims to provide a basic overview of the history of artificial intelligence and the inner workings of modern AI tools, as well as links to further readings.

History of Artificial Intelligence

AI has a history spanning centuries. The following slideshow overviews important milestones in the evolution of AI. It is adapted from Generative Artificial Intelligence in Teaching and Learning at McMaster University, by Paul R MacPherson Institute for Leadership, Innovation and Excellence in Teaching and is used under a CC BY 4.0 license.

How Generative AI Tools Work

This section introduces the basic technologies underlying text, image, video, and audio AI tools. For more information about specific AI tools and their features, please see the Generative AI Tools for Education chapter.

AI Text Generators

Popular AI chatbots like ChatGPT and Gemini use large language models(LLMs) to generate text. LLMs like OpenAI’s GPT 4 and Google’s PaLM 2 are generative AI algorithms that are capable of producing text such as essays, stories, dialogues, and code by predicting likely combinations of tokens (words or strings of characters) that should be used for a given task. LLMs are trained on massive volumes of text from online sources, with some leading LLMs processing over 3 billion pages of information as part of their training. This training involves having LLMs repeatedly predict the next word in a text in order to build a model of language patterns and structures. In addition to being trained on information from the internet, feedback from interactions with users provides information used to retrain and fine-tune LLMs.^[1]

For more information about how the technology underpinning LLMs works, please visit What are large language models and A Beginner’s Guide to Neural Networks (Deep Learning), or watch the following video.

AI Image Generators

AI image generators create images from user text prompts. They are trained using advanced machine learning models that analyze vast quantities of images (art, photographs, etc.) and accompanying text descriptions. The most recent versions of image generators like Dall-E and Midjourney use a process called diffusion. Diffusion models are trained by incrementally converting images to static ‘tv noise’ and back to images again, which teaches the models how to generate images from random noise. Through this process, the diffusion models learn how to accept user input and generate novel images.^[2]

A set of eight images with corresponding text. In the top left, there is an image of a Shibu Inu dog wearing a collar and looking into the camera. To the right is the same image but slightly fuzzy. The third image along the top is the same image of the dog. This one is very fuzzy, but the dog is still discernible. The last image along the top is pure static noise. Below these four images is an arrow pointing to the right. Below that, text reads: Forward Diffusion. Noise is slowly and iteratively added to corrupt the images in the training set. The goal here is to move them away from their existing "subspace". Below that is another set of four images. The first is pure static noise. To the right is an image of a Shibu Inu dog, wearing a bandana and similar in appearance to the other dog. This image is very fuzzy. The third image in this row is the same image of the dog, but only slightly fuzzy. The final image is of the same dog, and this one is not fuzzy at all. Compared to the first image of a dog in the top row, the image of the dog in this row appears to be AI-generated. The text below this row reads: Reverse Diffusion: Noisy images are iteratively reversed by referring to the steps taken during forward diffusion. There are multiple paths that could bring us back into the image "subspace". — Image from University of Toronto Libraries Research Guides-Artificial Intelligence for Image Research: How Generative AI Models Work, used with permission.

For more detail on how AI tools produce images, visit the University of Toronto Libraries’ AI for Image Research guide.

AI Video Generators

AI video-producing tools like OpenAI’s Sora use similar training methods to LLMs. Whereas an LLM is trained to predict how text tokens will interact with each other, Sora’s training breaks videos down into patches. Each patch is a smaller part of a video, in both size and length. Video-producing tools use diffusion to learn how to create original patches, which are refined to create video frames, and sequenced to produce a video.^[3]

One the left, a stack of seven images is presented. Only the top image is fully visible. It shows an underwater scene with coral, and a butterfly in the middle. Only the top and left edges of the other six images are visible. They appear to be similar, including underwater scenes and a butterfly. To the right is a right arrow, pointing to a parallelogram with the words Visual encoder in the middle. Continuing to the right is another right arrow, then a three-dimensional grid of small cubes. The grid is the same size and depth as the previous stack of images. To the right is another arrow, showing a two-dimensional series of small cubes that represents one patch from the previous grid. — Brooks, Peebles, et al., Video generation models as world simulators, 2024, https://openai.com/index/video-generation-models-as-world-simulators/.

For more information about the technology underlying AI video generators, please visit AI Video Generation: What is it and how does it work?

AI Audio Generators

AI audio generators can produce various forms of audio including realistic speech, music, sound effects, and ambient noise. AI audio tools are also trained on enormous amounts of data, and use machine learning to recognize and produce rhythm, pitch, tempo, and inflection in a piece of music, voice, or other type of audio.

Most AI audio generators are subscription based with limited functionality for free. ElevenLabs is one tool that allows users to produce a sample audio piece free of charge.

Read more about how AI audio generators work at What is an AI voice generator?

AI Tools of the Future

The capabilities of leading AI tools are expanding rapidly, with new features added to Gemini, ChatGPT, and CoPilot nearly every month. Some new features on the horizon include:

image generators that can include high-quality text within images;^[4]
increased integration with voice assistants;^[5]
greater ability to interpret multimedia input;
capacity to complete a series of tasks on a computer or other device—for example, viewing a screen, opening applications, and entering text (the AI tool Claude is already piloting this);
wider availability of generative video functionality;
greater ability for AI tools to carry out a series of interconnected tasks—for example, planning a project, completing steps at the appropriate times, and reporting progress on each step;^[6]
increased integration with the field of robotics.^[7]

It is also becoming easier for educational institutions, departments, and instructors to set up and customize their own AI tool for use in a particular institution, program, or course. Customizing an AI tool allows educators to create engaging learning experiences tailored to course and program learning outcomes. Read more about it in How to Create Custom AI Chatbots that Enrich Your Classroom from Harvard Business Publishing.

Key Takeaways

AI has a long history, and the capability of AI tools has grown exponentially over time.
Generative AI tools turn plain language prompts into text, code, images, videos, and audio.
Generative AI tools are trained on massive amounts of content in the formats they aim to produce—like text, images, and videos.
AI tools learn about users in order to customize interactions. As well, interactions and user feedback contribute to improving future versions of LLMs.

Exercises

How can AI tools be useful in your professional role? Make a list, ensuring that each item aligns with your employer’s institutional policies.
Experiment with an AI tool like Copilot, ChatGPT, or Gemini. Ask it to complete a few tasks related to your work context or out-of-work interests, reflecting on what it does well and what its weaknesses are. Consider which tool you prefer in terms of its interface and performance.
Try generating an image or video using one of the tools mentioned in the AI Image Generators and AI Video Generators section. How could you use one of these tools in your work?

Jamie Amarat Sandhu, "What are LLMs and generative AI? A beginner’s guide to the technology turning heads," University of Toronto, last modified January 25, 2024, https://srinstitute.utoronto.ca/news/gen-ai-llms-explainer. ↵
Rachel Gordon, "3 Questions: How AI image generators work." MIT CSAIL, last modified October 27, 2022, https://www.csail.mit.edu/news/3-questions-how-ai-image-generators-work. ↵
OpenAI, Tim Brooks, Bill Peebles, et al., "Video generation models as world simulators," February 15, 2024. https://openai.com/index/video-generation-models-as-world-simulators/. ↵
Bernard Marr, "The Future of Generative AI: 6 Predictions Everyone Should Know About," Forbes, last modified March 5, 2024, https://www.forbes.com/sites/bernardmarr/2024/03/05/the-future-of-generative-ai-6-predictions-everyone-should-know-about/. ↵
Bernard Marr, "The 10 Biggest AI Trends of 2025 Everyone Must Be Ready For Today," Forbes, last modified Sept. 24, 2024, https://www.forbes.com/sites/bernardmarr/2024/09/24/the-10-biggest-ai-trends-of-2025-everyone-must-be-ready-for-today/. ↵
Bernard Marr, "The 10 Biggest AI Trends of 2025 Everyone Must Be Ready For Today," Forbes, last modified Sept. 24, 2024, https://www.forbes.com/sites/bernardmarr/2024/09/24/the-10-biggest-ai-trends-of-2025-everyone-must-be-ready-for-today/. ↵
Bernard Marr, "The Future of Generative AI: 6 Predictions Everyone Should Know About," Forbes, last modified March 5, 2024, https://www.forbes.com/sites/bernardmarr/2024/03/05/the-future-of-generative-ai-6-predictions-everyone-should-know-about/. ↵

License

Icon for the Creative Commons Attribution 4.0 International License