
Diffusion models represent a cutting-edge approach to image generation, offering a dynamic framework for capturing temporal changes in data. The UNet encoder inside diffusion models has recently been under intense scrutiny, revealing intriguing patterns in feature transformations during inference. These models use an encoder propagation scheme to revolutionize diffusion sampling by reusing past features, enabling efficient parallel processing.
Researchers from Nankai University, Mohamed bin Zayed University of AI, Linkoping University, Harbin Engineering University, Universitat Autonoma de Barcelona examined the UNet encoder in diffusion models. They introduced an encoder propagation scheme and a previous noise injection method to enhance image quality. The proposed method preserves structural information effectively, but encoder and decoder dropping fail to attain complete denoising.
Originally designed for medical image segmentation, UNet has evolved, especially in 3D medical image segmentation. In text-to-image diffusion models like Stable Diffusion (SD) and DeepFloyd-IF, UNet is pivotal in advancing tasks akin to image editing, super-resolution, segmentation, and object detection. It proposes an approach to speed up diffusion models, employing encoder propagation and dropping for efficient sampling. In comparison with ControlNet, the proposed method concurrently applies to 2 encoders, reducing generation time and computational load while maintaining content preservation in text-guided image generation.
Diffusion models, integral in text-to-video and reference-guided image generation, leverage the UNet architecture, comprising an encoder, bottleneck, and decoder. While past research focused on the UNet decoder, it pioneered an in-depth examination of the UNet encoder in diffusion models. It explores changes in encoder and decoder features during inference and introduces an encoder propagation scheme for accelerated diffusion sampling.
The study proposes an encoder propagation scheme that reuses previous time-step encoder features to expedite diffusion sampling. It also introduces a previous noise injection method to reinforce texture details in generated images. The study also presents an approach for accelerated diffusion sampling without counting on knowledge distillation techniques.
The research thoroughly investigates the UNet encoder in diffusion models, revealing gentle changes in encoder features and substantial variations in decoder features during inference. Introducing an encoder propagation scheme, cyclically reusing previous time-step components for the decoder accelerates diffusion sampling and enables parallel processing. A previous noise injection method enhances texture details in generated images. The approach is validated across various tasks, achieving a notable 41% and 24% acceleration in SD and DeepFloyd-IF model sampling while maintaining high-quality generation. A user study confirms the proposed method’s comparable performance to baseline methods through pairwise comparisons with 18 users.
In conclusion, the study conducted could be presented in the next points:
- The research pioneers the primary comprehensive study of the UNet encoder in diffusion models.
- The study examines changes in encoder features during inference.
- An progressive encoder propagation scheme accelerates diffusion sampling by cyclically reusing encoder features, allowing for parallel processing.
- A noise injection method enhances texture details in generated images.
- The approach has been validated across diverse tasks and exhibits significant sampling acceleration for SD and DeepFloyd-IF models without knowledge distillation while maintaining high-quality generation.
- The FasterDiffusion code release enhances reproducibility and encourages further research in the sector.
Take a look at the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to hitch our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.
If you happen to like our work, you’ll love our newsletter..
Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is obsessed with applying technology and AI to handle real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.