Stanford just published a huge 470-page study 📕
"The Principles of Diffusion Models"
Explains how diffusion models turn noise into data and ties their main ideas together.
It starts from a forward process that adds noise over time, then learns the exact reverse.
The reverse uses a time dependent velocity field that tells how to move a sample at each step.
Sampling becomes solving a time based equation that carries noise to data along a trajectory.
There are 3 views of this idea, variational, score-based, and flow-based, and they describe the same thing.
There are also 4 training targets, noise, clean data, score, and velocity, and these are equivalent.
Shows how guidance can steer outputs using a prompt or label without extra classifiers.
Reviews fast solvers that cut steps while keeping quality stable.
Explains distillation methods that shrink many sampling steps into a few by mimicking a teacher model.
Introduces flow map models that learn direct jumps between times for fast generation from scratch.