A deep Neural network is crucial in synthesizing photorealistic images and videos using large-scale image and video generative models. These models will be made into productive tools for humans through a critical step: adding control. It will empower generative models to follow the instructions humans provided as an alternative of randomly generating data samples. Extensive studies have been conducted to attain this goal. For instance, in Generative Adversarial Networks (GANs), a widespread solution is to make use of adaptive normalization that dynamically scales and shifts the intermediate feature maps in line with the input condition.
Nevertheless, widely used techniques share the identical underlying mechanism, i.e., adding control by feature space manipulation despite the difference within the operations. Also, the neural network weight, convolution, or linear layers remain the identical for various conditions. So, two critical questions arise: (a) can image generative models be controlled by manipulating their weight? (b) Can controlled image generative models profit from this latest conditional control method? This paper goals to deal with each the issues in an efficient way.
Researchers from MIT, Tsinghua University, and NVIDIA introduces Condition-Aware Neural Network (CAN), a brand new method for adding control to image generative models. CAN successfully control the image generation process by dynamically manipulating the burden of the neural network. To attain this, a condition-aware weight generation module is introduced that generates conditional weight for convolution/linear layers based on the input condition. There are two critical insights for CAN: selecting a subset of modules to be condition-aware is helpful for each efficiency and performance. Secondly, directly generating the conditional weight is rather more effective.
CAN is evaluated on two representative diffusion transformer models, DiT and UViT. It achieves significant performance boosts for all these diffusion transformer models while incurring negligible computational cost increases. CAN resolve various issues:
- This latest mechanism controls image-generative models and demonstrates the effectiveness of weight manipulation for conditional control.
- CAN is a brand new conditional control method that will be utilized in practice with the assistance of design insights. It outperforms prior conditional control methods by a major margin.
- CAN profit the deployment of image generative models and achieves a greater FID on ImageNet 512×512 through the use of 52× fewer MACs than DiT-XL/2 per sampling step.
As a substitute of directly generating the conditional weight, Adaptive Kernel Selection (AKS) is one other possible approach that maintains a set of base convolution kernels and dynamically generates scaling parameters to mix these base kernels. The parameter of AKS has a smaller overhead than that of CAN; nevertheless, it cannot match CAN’s performance. This tells that dynamic parameterization shouldn’t be the one key to higher performance. Furthermore, CAN is tested on class conditional image generation on ImageNet and text-to-image generation on COCO, leading to significant improvements for diffusion transformer models.
In conclusion, CAN is a brand new conditional control method for adding control to image generative models. For CAN’s effectiveness, the experiment is carried out on class-conditional generation using ImageNet and text-to-image generation using COCO, delivering consistent and significant improvements over prior conditional control methods. Aside from this, a brand new family of diffusion transformer models was built by marrying CAN and EfficientViT. Future work includes applying CAN to more difficult tasks like large-scale text-to-image generation, video generation, etc.
Take a look at the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.
If you happen to like our work, you’ll love our newsletter..
Don’t Forget to affix our 39k+ ML SubReddit
Sajjad Ansari is a final 12 months undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the sensible applications of AI with a concentrate on understanding the impact of AI technologies and their real-world implications. He goals to articulate complex AI concepts in a transparent and accessible manner.