Diffusion Models

Technology

A class of generative models used in AI for creating images and video. Sam Altman notes that OpenAI's best image and video models, like Sora, are diffusion models.


First Mentioned

10/12/2025, 6:49:24 AM

Last Updated

10/12/2025, 6:53:39 AM

Research Retrieved

10/12/2025, 6:53:39 AM

Summary

Diffusion models, also known as diffusion-based generative models or score-based generative models, represent a class of latent variable generative models in machine learning. They operate by learning a diffusion process to generate new data elements that are statistically similar to a given dataset. The core mechanism involves a forward diffusion process, where Gaussian noise is gradually added to data, and a reverse sampling process, where a neural network (often a U-net or transformer) is trained to denoise the data. Primarily applied in computer vision for tasks such as image generation, denoising, inpainting, and super-resolution, diffusion models have also found utility in natural language processing, sound generation, and reinforcement learning. Commercial successes like Stable Diffusion and DALL-E leverage these models, often combining them with text encoders for text-conditioned generation.

Referenced in 1 Document
Research Data
Extracted Attributes
  • Category

    Generative AI models

  • Mechanism

    Gradually adds Gaussian noise in forward process, learns to remove noise in reverse process

  • Also known as

    Diffusion-based generative models, Score-based generative models

  • Core Components

    Forward diffusion process, Reverse sampling process

  • Training Method

    Variational inference

  • NLP Applications

    Text generation, Summarization

  • Other Application Fields

    Natural Language Processing, Sound generation, Reinforcement learning

  • Primary Application Field

    Computer Vision

  • Computer Vision Applications

    Image denoising, Inpainting, Super-resolution, Image generation, Video generation

  • Typical Neural Network Backbone

    U-nets, Transformers

  • Underlying Concept (loosely based on)

    Non-equilibrium thermodynamics

  • Advantages (compared to traditional generative models)

    Better image quality, interpretable latent space, robustness to overfitting

Timeline
  • Diffusion models were introduced as a method to train a model for sampling from complex probability distributions, utilizing techniques from non-equilibrium thermodynamics. (Source: Wikipedia, Web Search)

    2015

  • As of this year, diffusion models are mainly used for computer vision tasks. (Source: Wikipedia)

    2024

Diffusion model

In machine learning, diffusion models, also known as diffusion-based generative models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of two major components: the forward diffusion process, and the reverse sampling process. The goal of diffusion models is to learn a diffusion process for a given dataset, such that the process can generate new elements that are distributed similarly as the original dataset. A diffusion model models data as generated by a diffusion process, whereby a new datum performs a random walk with drift through the space of all possible data. A trained diffusion model can be sampled in many ways, with different efficiency and quality. There are various equivalent formalisms, including Markov chains, denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. They are typically trained using variational inference. The model responsible for denoising is typically called its "backbone". The backbone may be of any kind, but they are typically U-nets or transformers. As of 2024, diffusion models are mainly used for computer vision tasks, including image denoising, inpainting, super-resolution, image generation, and video generation. These typically involve training a neural network to sequentially denoise images blurred with Gaussian noise. The model is trained to reverse the process of adding noise to an image. After training to convergence, it can be used for image generation by starting with an image composed of random noise, and applying the network iteratively to denoise the image. Diffusion-based image generators have seen widespread commercial interest, such as Stable Diffusion and DALL-E. These models typically combine diffusion models with other models, such as text-encoders and cross-attention modules to allow text-conditioned generation. Other than computer vision, diffusion models have also found applications in natural language processing such as text generation and summarization, sound generation, and reinforcement learning.

Web Search Results
  • A Very Short Introduction to Diffusion Models | by Kailash Ahirwar

    ### What are Diffusion Models? Diffusion models are a class of generative AI models that generate high-resolution images of varying quality. They work by gradually adding Gaussian noise to the original data in the forward diffusion process and then learning to remove the noise in the reverse diffusion process. They are latent variable models referring to a hidden continuous feature space, look similar to VAEs(Variational Autoencoders), and are loosely based on non-equilibrium thermodynamics.

  • Diffusion model - Wikipedia

    In machine learning, diffusion models, also known as diffusion-based generative models or score-based generative models, are a class of latent variablegenerative models. A diffusion model consists of two major components: the forward diffusion process, and the reverse sampling process. The goal of diffusion models is to learn a diffusion process for a given dataset, such that the process can generate new elements that are distributed similarly as the original dataset. A diffusion model models [...] As of 2024( diffusion models are mainly used for computer vision tasks, including image denoising, inpainting, super-resolution, image generation, and video generation. These typically involve training a neural network to sequentially denoise images blurred with Gaussian noise.( The model is trained to reverse the process of adding noise to an image. After training to convergence, it can be used for image generation by starting with an image composed of random noise, and applying the network [...] [edit] ### Non-equilibrium thermodynamics [edit] Diffusion models were introduced in 2015 as a method to train a model that can sample from a highly complex probability distribution. They used techniques from non-equilibrium thermodynamics, especially diffusion.(

  • An Introduction to Diffusion Models for Machine Learning - Encord

    What are diffusion models?Diffusion models are generative models used for data synthesis. They generate data by applying a sequence of transformations to random noise, producing realistic samples that resemble the training data distribution. [...] Diffusion models are generative models that simulate how data is made by using a series of invertible operations to change a simple starting distribution into the desired complex distribution. Compared to traditional generative models, diffusion models have better image quality, interpretable latent space, and robustness to overfitting. [...] Diffusion models are a promising approach for text-to-video synthesis. The process involves first representing the textual descriptions and video data in a suitable format, such as word embeddings or transformer-based language models for text and video frames in a sequence format.

  • Introduction to Diffusion Models for Machine Learning - AssemblyAI

    Diffusion Models are generative models, meaning that they are used to generate data similar to the data on which they are trained.Fundamentally, Diffusion Models work by destroying training data through the successive addition of Gaussian noise, and then learning to recover the data by reversing this noising process. After training, we can use the Diffusion Model to generate data by simply passing randomly sampled noise through the learned denoising process. [...] More specifically, a Diffusion Model is a latent variable model which maps to the latent space using a fixed Markov chain. This chain gradually adds noise to the data in order to obtain the approximate posterior q(x1:T|x0), where x1,...,xT are the latent variables with the same dimensionality as x0. In the figure below, we see such a Markov chain manifested for image data. [...] As mentioned above, a Diffusion Model consists of a forward process (or diffusion process), in which a datum (generally an image) is progressively noised, and a reverse process (or reverse diffusion process), in which noise is transformed back into a sample from the target distribution.

  • What are Diffusion Models? | IBM

    Diffusion models are among the neural network architectures at the forefront of generative AI, most notably represented by popular text-to-image models including Stability AI’s Stable Diffusion, OpenAI’s DALL-E (beginning with DALL-E-2), Midjourney and Google’s Imagen. They improve upon the performance and stability of other machine learning architectures used for image synthesis such as variational autoencoders (VAEs), generative adversarial networks (GANs) and autoregressive models such as [...] Artificial Intelligence # What are diffusion models? ## Authors Dave Bergmann Staff Writer, AI Models IBM Think Cole Stryker Staff Editor, AI Models IBM Think ## What are diffusion models? Diffusion models are generative models used primarily for image generation and other computer vision tasks. Diffusion-based neural networks are trained through deep learning to progressively “diffuse” samples with random noise, then reverse that diffusion process to generate high-quality images.