Training Diffusion Models with Reinforcement Learning – The Berkeley Artificial Intelligence Research Blog

neub9
By neub9
2 Min Read

How Diffusion Models Can be Trained with Reinforcement Learning for Improved Performance

Diffusion models have become widely used for generating complex, high-dimensional outputs such as AI art, synthetic images, drug design, and continuous control. While traditionally trained using maximum likelihood estimation to match training data, many applications are instead focused on downstream objectives. In this post, we explore using reinforcement learning (RL) to train diffusion models directly on these objectives.

To achieve this, we finetune Stable Diffusion on various objectives including image compressibility, human-perceived aesthetic quality, and prompt-image alignment, using feedback from a large vision-language model. Our algorithm, denoising diffusion policy optimization (DDPO), refocuses the diffusion process as a multi-step Markov decision process, allowing for better maximization of rewards.

We found that DDPO significantly outperforms existing algorithms, and when finetuning Stable Diffusion, it generalizes to unseen and novel inputs. However, we also identified overoptimization issues when the model exploits rewards to achieve high scores in non-useful ways.

In conclusion, the study suggests that training diffusion models with RL yields improved performance, while also highlighting the importance of addressing overoptimization in future work. Overall, this work demonstrates the potential for combining diffusion models with RL to achieve better performance on downstream objectives.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *