A diffusion model is trained to reverse a process that gradually adds noise to data. At generation time it starts from pure noise and removes it step by step, usually guided by a text embedding, until a coherent sample emerges. This is the technique behind most modern open image and video generators.
Open-source diffusion ecosystems provide model weights, community fine-tunes, and pipelines you can run on your own GPU.