High-accuracy sampling for diffusion models and log-concave distributions
Authors
Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin
Abstract
We present algorithms for diffusion model sampling which obtain $δ$-error in $\mathrm{polylog}(1/δ)$ steps, given access to $\widetilde O(δ)$-accurate score estimates in $L^2$. This is an exponential improvement over all previous results. Specifically, under minimal data assumptions, the complexity is $\widetilde O(d\,\mathrm{polylog}(1/δ))$ where $d$ is the dimension of the data; under a non-uniform $L$-Lipschitz condition, the complexity is $\widetilde O(\sqrt{dL}\,\mathrm{polylog}(1/δ))$; and if the data distribution has intrinsic dimension $d_\star$, then the complexity reduces to $\widetilde O(d_\star\,\mathrm{polylog}(1/δ))$. Our approach also yields the first $\mathrm{polylog}(1/δ)$ complexity sampler for general log-concave distributions using only gradient evaluations.
Concepts
The Big Picture
Imagine navigating a vast, foggy terrain with only a compass. You can feel which direction is “downhill” at any point, but you can’t see the full topography. That’s roughly the challenge facing diffusion models, the AI systems behind image generators like DALL-E and Stable Diffusion.
These models create realistic images, molecules, and more by learning to reverse a noisy process: starting from pure static, they gradually refine it into something real. But they have a fundamental limitation. They can only sense the local slope of the probability surface at each step, not the surface itself. So how many steps does it actually take to get where you’re going?
For years, theorists faced a frustrating answer. The best algorithms required steps scaling as a polynomial in 1/δ, where δ is your target accuracy (how close your output needs to be to a true sample). Want to be twice as accurate? That might cost four times as many steps. Ten times more accurate? A hundred times more steps.
This “polynomial curse” made very high accuracy brutally expensive. By contrast, sampling methods that can evaluate the full probability directly, not just the slope, had long enjoyed exponentially fast convergence, needing only logarithmically more steps for each extra digit of accuracy.
Chen, Chewi, Daskalakis, and Rakhlin have now broken through that barrier. They’ve built diffusion-model samplers that achieve δ-accuracy in steps scaling only as polylog(1/δ), something like (log(1/δ))² instead of (1/δ)². That’s an exponential improvement, using nothing but slope (gradient) measurements.
Key Insight: By simulating the logic of rejection sampling using only gradient information, this work achieves the first exponentially fast convergence guarantees for diffusion models, settling the open question of whether score-based methods can match the efficiency of density-based samplers.
How It Works
The core idea is First-Order Rejection Sampling (FORS). Classical rejection sampling proposes candidates from a simple distribution and accepts or rejects them based on probability density ratios. It’s provably fast and produces exact samples, but it requires evaluating the probability density itself, which diffusion models don’t have access to.
FORS gets around this by simulating rejection sampling’s logic using only gradient queries. You can’t directly compute probability ratios, but you can approximate them using score function estimates, which measure how sharply and in which direction the log-probability changes at each point. The algorithm stitches these measurements along the diffusion trajectory to reconstruct, approximately, the accept/reject decisions that classical rejection sampling would make.
The paper proves three distinct complexity bounds under different assumptions:
- Minimal data assumptions (just finite second moment, meaning the data doesn’t spread infinitely far): δ-error in Õ(d · polylog(M/δ)) steps, where d is the data dimension and M captures the data’s scale.
- Non-uniform Lipschitz score functions: Õ(√(dL) · polylog(M/δ)) steps, where L is the Lipschitz constant, a bound on how quickly slope measurements can change. This is a weaker condition than prior work required.
- Low-dimensional data: Õ(d★ · polylog(M/δ)) steps, where d★ is the intrinsic dimension of the data. If your images truly live on a low-dimensional surface (a manifold) rather than filling the full high-dimensional space, the algorithm exploits this automatically.
All three results achieve polylog(1/δ) accuracy dependence, an exponential improvement over every prior algorithm.
The log-concave sampling result is just as significant. Log-concave distributions, a broad class including Gaussians, exponentials, and many statistical models, were already well-studied. But prior high-accuracy samplers either required density evaluations or very special structure. FORS yields the first polylog(1/δ) complexity sampler for general log-concave distributions using only gradient evaluations, matching results that previously needed zeroth-order queries (direct probability evaluations at each point rather than just slope measurements).
Why It Matters
Diffusion models now power image synthesis, audio generation, protein structure prediction, and scientific simulation across generative AI. Their theoretical foundations, though, have lagged behind their empirical success. The fact that existing algorithms had polynomial rather than exponential convergence was a genuine embarrassment: either practitioners were using fundamentally suboptimal algorithms, or the theory was too pessimistic to be useful.
This work resolves that tension.
Beyond generative AI, the result connects to questions in computational statistics and physics. Log-concave distributions show up everywhere in science: statistical mechanics, Bayesian inference, quantum field theory. Efficient sampling from them is essential for Monte Carlo simulations in particle physics, Bayesian analysis of experimental data, and much more.
High-accuracy samplers that rely only on gradient evaluations, not full probability computations, can be deployed in settings where density evaluation is intractable. Open questions remain. Can similar ideas yield high-accuracy guarantees for non-log-concave targets beyond the diffusion process framework? How do the constants hidden in the polylog bounds behave in practical, high-dimensional regimes?
Bottom Line: By inventing first-order rejection sampling, Chen, Chewi, Daskalakis, and Rakhlin achieve an exponential speedup for diffusion model sampling, collapsing polynomial-in-accuracy step counts to logarithmic ones with no density evaluations required and minimal assumptions on the data.
IAIFI Research Highlights
This work brings together ideas from theoretical computer science (rejection sampling, complexity theory) with the mathematics of stochastic differential equations and score-based generative models, building algorithmic foundations at the intersection of AI and simulation science.
The paper provides the first provably exponentially fast diffusion model sampler using only score (gradient) queries, settling an open question in the theory of generative AI.
High-accuracy log-concave samplers with only gradient access are directly useful for scientific computing, including Monte Carlo methods in lattice field theory, particle physics simulations, and Bayesian inference on experimental data.
Future work may extend these guarantees beyond log-concave and manifold-supported distributions. The paper is available as [arXiv:2602.01338](https://arxiv.org/abs/2602.01338) (Chen, Chewi, Daskalakis, Rakhlin, 2026).