
Generative AI is a term that all of us are accustomed to nowadays. They’ve advanced rather a lot in recent times and have change into a key tool in multiple applications.
The star of the generative AI show is the diffusion models. They’ve emerged as a strong class of generative models, revolutionizing image synthesis and related tasks. These models have shown remarkable performance in generating high-quality and diverse images. Unlike traditional generative models equivalent to GANs and VAEs, diffusion models work by iteratively refining a noise source, allowing for stable and coherent image generation.
Diffusion models have gained significant traction on account of their ability to generate high-fidelity images with enhanced stability and reduced mode collapse during training. This has led to their widespread adoption and application across diverse domains, including image synthesis, inpainting, and elegance transfer.
Nevertheless, they aren’t perfect. Despite their impressive capabilities, certainly one of the challenges with diffusion models lies in effectively steering the model towards specific desired outputs based on textual descriptions. It is generally annoying to exactly describe the preferences through text prompts, sometimes, they are only not enough, or the model insists on ignoring them. So, you often have to refine the generated image to make it usable.
But what you wanted the model to attract. So, in theory, you might be the perfect person to guage the standard of the generated image; how close it resembles your imagination. What if we could integrate this feedback into the image generation pipeline so the model could understand what we desired to see? Time to fulfill with FABRIC.
FABRIC (Feedback via Attention-Based Reference Image Conditioning) is a novel approach to enable the mixing of iterative feedback into the generative strategy of diffusion models.
FABRIC utilizes positive and negative feedback images gathered from previous generations or human input. This allows it to leverage reference image-conditioning to refine future results. This iterative workflow facilitates the fine-tuning of generated images based on user preferences, providing a more controllable and interactive text-to-image generation process.
FABRIC is inspired by ControlNet, which introduced the power to generate recent images just like reference images. FABRIC leverages the self-attention module within the U-Net, allowing it to “listen” to other pixels within the image and inject additional information from a reference image. The keys and values for reference injection are computed by passing the noised reference image through the U-Net of Stable Diffusion. These keys and values are stored within the self-attention layers of the U-Net, allowing the denoising process to take care of the reference image and incorporate semantic information.
Furthermore, FABRIC is prolonged to include multi-round positive and negative feedback, where separate U-Net passes are performed for every liked and disliked image, and the eye scores are reweighted based on the feedback. The feedback process might be scheduled in keeping with denoising steps, allowing for iterative refinement of the generated images.
Try the Paper and GitHub. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to hitch our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, together with his dissertation titled “Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning.” His research interests include deep learning, computer vision, video encoding, and multimedia networking.