TY - GEN
T1 - Compensation Sampling for Improved Convergence in Diffusion Models
AU - Lu, Hui
AU - Salah, Albert Ali
AU - Poppe, Ronald
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Diffusion models achieve remarkable quality in image generation, but at a cost. Iterative denoising requires many time steps to produce high fidelity images. We argue that the denoising process is crucially limited by an accumulation of the reconstruction error due to an initial inaccurate reconstruction of the target data. This leads to lower quality outputs, and slower convergence. To address these issues, we propose compensation sampling to guide the generation towards the target domain. We introduce a compensation term, implemented as a U-Net, which adds negligible computation overhead during training. Our approach is flexible and we demonstrate its application in unconditional generation, face inpainting, and face de-occlusion on benchmark datasets CIFAR-10, CelebA, CelebA-HQ, FFHQ-256, and FSG. Our approach consistently yields state-of-the-art results in terms of image quality, while accelerating the denoising process to converge during training by up to an order of magnitude (Our code and models will be made publicly available upon acceptance of the paper.).
AB - Diffusion models achieve remarkable quality in image generation, but at a cost. Iterative denoising requires many time steps to produce high fidelity images. We argue that the denoising process is crucially limited by an accumulation of the reconstruction error due to an initial inaccurate reconstruction of the target data. This leads to lower quality outputs, and slower convergence. To address these issues, we propose compensation sampling to guide the generation towards the target domain. We introduce a compensation term, implemented as a U-Net, which adds negligible computation overhead during training. Our approach is flexible and we demonstrate its application in unconditional generation, face inpainting, and face de-occlusion on benchmark datasets CIFAR-10, CelebA, CelebA-HQ, FFHQ-256, and FSG. Our approach consistently yields state-of-the-art results in terms of image quality, while accelerating the denoising process to converge during training by up to an order of magnitude (Our code and models will be made publicly available upon acceptance of the paper.).
KW - Diffusion models
KW - Image generation
KW - Iterative denoising
UR - http://www.scopus.com/inward/record.url?scp=85210814518&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-73030-6_11
DO - 10.1007/978-3-031-73030-6_11
M3 - Conference contribution
AN - SCOPUS:85210814518
SN - 978-3-031-73029-0
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 183
EP - 201
BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
A2 - Leonardis, Aleš
A2 - Ricci, Elisa
A2 - Roth, Stefan
A2 - Russakovsky, Olga
A2 - Sattler, Torsten
A2 - Varol, Gül
PB - Springer
T2 - 18th European Conference on Computer Vision, ECCV 2024
Y2 - 29 September 2024 through 4 October 2024
ER -