RE: "interpolate between them, or generalize them further"
Please check out Planned Diffusion, which is a hybrid AR/diffusion model.
"An hour of planning can save you 10 hours of doing."
✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation.
Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.