A 12 months ago, generating realistic images with AI was a dream. We were impressed by seeing generated faces that resemble real ones, despite the vast majority of outputs having three eyes, two noses, etc. Nonetheless, things modified quite rapidly with the discharge of diffusion models. Nowadays, it’s difficult to differentiate an AI-generated image from an actual one.
The power to generate high-quality images is one a part of the equation. If we were to utilize them properly, efficiently compressing them plays an important role in tasks akin to content generation, data storage, transmission, and bandwidth optimization. Nonetheless, image compression has predominantly relied on traditional methods like transform coding and quantization techniques, with limited exploration of generative models.
Despite their success in image generation, diffusion models and score-based generative models haven’t yet emerged because the leading approaches for image compression, lagging behind GAN-based methods. They often perform worse or on par with GAN-based approaches like HiFiC on high-resolution images. Even attempts to repurpose text-to-image models for image compression have yielded unsatisfactory results, producing reconstructions that deviate from the unique input or contain undesirable artifacts.
The gap between the performance of score-based generative models in image generation tasks and their limited success in image compression raises intriguing questions and motivates further investigation. It’s surprising that models able to generating high-quality images haven’t been in a position to surpass GANs in the precise task of image compression. This discrepancy suggests that there could also be unique challenges and considerations when applying score-based generative models to compression tasks, necessitating specialized approaches to harness their full potential.Â
So we all know there may be a possible for using score-based generative models in image compression. The query is, how can it’s done? Allow us to jump into the reply.
Google researchers proposed a technique that mixes a typical autoencoder, optimized for mean squared error (MSE), with a diffusion process to recuperate and add positive details discarded by the autoencoder. The bit rate for encoding a picture is solely determined by the autoencoder, because the diffusion process doesn’t require additional bits. By fine-tuning diffusion models specifically for image compression, it’s shown that they will outperform several recent generative approaches when it comes to image quality.Â
The strategy explores two closely related approaches: diffusion models, which exhibit impressive performance but require a lot of sampling steps, and rectified flows, which perform higher when fewer sampling steps are allowed.Â
The 2-step approach consists of first encoding the input image using the MSE-optimized autoencoder after which applying either the diffusion process or rectified flows to boost the realism of the reconstruction. The diffusion model employs a noise schedule that’s shifted in the wrong way in comparison with text-to-image models, prioritizing detail over global structure. Alternatively, the rectified flow model leverages the pairing provided by the autoencoder to directly map autoencoder outputs to uncompressed images.
Furthermore, the study revealed specific details that might be useful for future research on this domain. For instance, it’s shown that the noise schedule and the quantity of noise injected during image generation significantly impact the outcomes. Interestingly, while text-to-image models profit from increased noise levels when training on high-resolution images, it’s found that reducing the general noise of the diffusion process is advantageous for compression. This adjustment allows the model to focus more on positive details, because the coarse details are already adequately captured by the autoencoder reconstruction.
Check Out The Paper. Don’t forget to affix our 24k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more. If you could have any questions regarding the above article or if we missed anything, be happy to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, TĂĽrkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, along with his dissertation titled “Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning.” His research interests include deep learning, computer vision, video encoding, and multimedia networking.