David · Oct 29, 2025 · 4:03 PM UTC

David

Scott Linderman retweeted

David @dmzoltowski

Oct 29

Excited to share these two papers on parallelizing MCMC and theory for parallelizing nonlinear sequence models! This is a fun line of work and there is lots more to explore in these areas. Collabs with @SkylerWu9 @xavierjgonzalez @Leokoz8 Ken Clarkson & @scott_linderman

Scott Linderman @scott_linderman

Oct 29

📣Announcing 2 @NeurIPSConf papers! "Parallelizing MCMC Across the Sequence Length": uses Newton iterations to parallelize MCMC! 🤯 But can we parallelize any nonlinear state space model? "Predictability Enables Parallelizability": proves what systems we can parallelize. 🧵

Leo Kozachkov · Oct 29, 2025 · 8:24 PM UTC

Scott Linderman retweeted

Leo Kozachkov @Leokoz8

Oct 29

Great collaborations with awesome people! I am really excited about this line of work. IMHO this is a fundamentally new way to think about evaluating nonlinear dynamical systems (nonlinear SSMs), and one that can lead to massive practical speedups on GPU.

Scott Linderman @scott_linderman

Oct 29

Xavier Gonzalez · Oct 29, 2025 · 9:32 PM UTC

Scott Linderman retweeted

Xavier Gonzalez

@xavierjgonzalez

Oct 29

Parallelizing "inherently sequential" processes has become all the rage in the era of GPUs. But can we really parallelize anything? In work led by me and @Leokoz8, we show that the "predictability" of a dynamical system determines whether it can be parallelized efficiently. @dmzoltowski and @SkylerWu9 put the theory into practice by parallelizing Markov chain Monte Carlo! Turns out MCMC chains are often predictable in practice! Really exciting work, with clear implications. And yet still so much to discover and explore!

Scott Linderman @scott_linderman

Oct 29

GIF

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

We can use the ungulates (quasi-DEER and more!) to parallelize MCMC (arxiv.org/pdf/2508.18413). And we have theoretical insight as to which SSMs we can parallelize, and how we should design them (arxiv.org/pdf/2508.16817) Bright future for parallel nonlinear dynamics! [17/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

We show that predictability enables parallelizability for a variety of algorithms, including DEER, quasi-DEER, and gradient descent! [16/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

For example, here's a predictable (blue) and unpredictable (red) RNN, evaluated sequentially (top) and in parallel (bottom). Each parallel step is slower than a sequential one! So we need to converge in a small # of them [15/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

Consequences: - predictable dynamics, LLE < 0: as T grows, PL constant converges to nonzero limit -> can parallelize quickly ✅ -unpredictable dynamics, LLE>0: as T grows, PL constant goes to zero exponentially quickly -> sequential faster than parallel ❌ [14/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

We proved that the merit function's PL constant is bounded above and below by a simple function of the LLE! We were surprised to find such a tight link between dynamics and optimization! [13/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

We connect "predictability" and merit geometry rigorously. For "predictability", the Largest Lyapunov Exponent (LLE) measures how quickly trajectories diverge. For "flatness", we use the PL constant, which controls how small the gradient can get. [12/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

What is predictability? Say we give our state a nudge. Does that nudge blow up or go away? If nudges blow up, it's hard to predict the future! Think chaos, butterfly effect, the weather. If nudges go away, we can use nearby states to predict future dynamics! [11/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

Main finding: predictable dynamics lead to well-conditioned merit landscapes. Easy to optimize ✅ Unpredictable dynamics lead to poorly-conditioned merit landscapes with flat, narrow valleys 😨. These are hard to optimize ❌ Look at this diagram that comes from 2 RNNs! [10/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

We provide a sharp characterization of when you can and cannot parallelize a nonlinear SSM. Plus a recipe for designing new fully nonlinear architectures that are parallelizable by design! [9/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

We are only faster than sequential if DEER converges in a small # of iters. Why does DEER converge quickly in MCMC? Turns out empirically that MCMC dynamics are often contractive. Time for our theory contributions that "predictability enables parallelization!" [8/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

The novelty of our contribution also opens doors to a variety of new approaches to MCMC, such as: - stopping the DEER iterations early (approximate MCMC) - mixing parallelization over sequence length and batch - and more! [7/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

Using these techniques, we get order-of-magnitude speed-ups in a variety of standard MCMC examples like - multimodal distributions - 8 schools - Bayesian logistic regression (German Credit Dataset) - Sentiment classification from LLM embeddings (D=768)

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

But in examples with large state size, DEER scales badly. So, we developed new quasi-Newton iterations that: - work with ~nondifferentiable~ accept/reject steps - use Hutchinson’s estimator and JVPs to be extra memory and compute efficient - and more! [5/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

We can put many of the major MCMC workhorses into state space form. These include - Gibbs - MALA - HMC We can then apply DEER to parallelize! DEER is the Gauss-Newton optimization method on an appropriately defined merit function. [4/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

People used to think MCMC was inherently sequential—so how were we able to parallelize it? The 🪄magic is the DEER/DeepPCR algorithm. DEER works by making a guess for *all* the states over the entire trajectory. And then iteratively refining them until convergence. [3/17]

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

Parallel MCMC paper: arxiv.org/abs/2508.18413 code: github.com/lindermanlab/para… Led by @dmzoltowski and @SkylerWu9 Predictability Enables Parallelizability paper: arxiv.org/abs/2508.16817 Led by @xavierjgonzalez and @Leokoz8, in collaboration with Ken Clarkson! [2/17]

Predictability Enables Parallelization of Nonlinear State Space Models

The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER...

arxiv.org

Scott Linderman · Oct 29, 2025 · 3:46 PM UTC

Scott Linderman @scott_linderman

Oct 29

230