Diffusion models have revolutionized generative modeling across various data types. Nonetheless, in practical applications like generating aesthetically pleasing images from text descriptions, fine-tuning is usually needed. Text-to-image diffusion models employ techniques like classifier-free guidance and curated datasets reminiscent of LAION Aesthetics to enhance alignment and image quality.
Of their research, the authors present an easy and efficient method for gradient-based reward fine-tuning, which involves differentiating through the diffusion sampling process. They introduce the concept of Direct Reward Effective-Tuning (DRaFT), which essentially backpropagates through your complete sampling chain, typically represented as an unrolled computation graph with a length of fifty steps. To administer memory and computational costs effectively, they employ gradient checkpointing techniques and optimize LoRA weights as an alternative of modifying your complete set of model parameters.
The above image demonstrates DRaFT using human preference reward models. Moreover, the authors introduce enhancements to the DRaFT method to reinforce its efficiency and performance. First, they propose DRaFT-K, a variant that limits backpropagation to only the last K steps of sampling when computing the gradient for fine-tuning. Empirical results exhibit that this truncated gradient approach significantly outperforms full backpropagation with the identical number of coaching steps, as full backpropagation can result in issues with exploding gradients.
Moreover, the authors introduce DRaFT-LV, a variation of DRaFT-1 that computes lower-variance gradient estimates by averaging over multiple noise samples, further improving efficiency of their approach.
The authors of the study applied DRaFT to Stable Diffusion 1.4 and conducted evaluations using various reward functions and prompt sets. Their methods, which leverage gradients, demonstrated significant efficiency benefits in comparison with RL-based fine-tuning baselines. As an example, they achieved over a 200-fold speed improvement when maximizing scores from the LAION Aesthetics Classifier in comparison with RL algorithms.
DRaFT-LV, one in all their proposed variations, exhibited exceptional efficiency, learning roughly twice as fast as ReFL, a previous gradient-based fine-tuning method. Moreover, they demonstrated the flexibility of DRaFT by combining or interpolating DRaFT models with pre-trained models, which might be achieved by adjusting LoRA weights through mixing or scaling.
In conclusion, directly fine-tuning diffusion models on differentiable rewards offers a promising avenue for improving generative modeling techniques, with implications for applications spanning images, text, and more. Its efficiency, versatility, and effectiveness make it a beneficial addition to the toolkit of researchers and practitioners in the sector of machine learning and generative modeling.
Take a look at the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to affix our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more.
When you like our work, you’ll love our newsletter..
We’re also on WhatsApp. Join our AI Channel on Whatsapp..
Janhavi Lande, is an Engineering Physics graduate from IIT Guwahati, class of 2023. She is an upcoming data scientist and has been working on the planet of ml/ai research for the past two years. She is most fascinated by this ever changing world and its constant demand of humans to maintain up with it. In her pastime she enjoys traveling, reading and writing poems.