
The creative industries have witnessed a brand new era of possibilities with the arrival of generative models—computational tools able to generating texts or images based on training data. Inspired by these advancements, researchers from Stanford University, UC Berkeley, and Adobe Research have introduced a novel model that may seamlessly insert specific humans into different scenes with impressive realism.
The researchers employed a self-supervised training approach to coach a diffusion model. This generative model converts “noise” into desired images by adding after which reversing the means of “destroying” the training data. The model was trained on videos featuring humans moving inside various scenes, choosing two frames randomly from each video. The humans in the primary frame were masked, and the model used the unmasked individuals within the second frame as a conditioning signal to reconstruct the individuals within the masked frame realistically.
The model learned to infer potential poses from the scene context through this training process, re-pose the person, and seamlessly integrate them into the scene. The researchers found that their generative model performed exceptionally well in placing individuals in scenes, generating edited images that appeared highly realistic. The model’s predictions of affordances—perceived possibilities for actions or interactions inside an environment—outperformed non-generative models previously introduced.
The findings hold significant potential for future research in affordance perception and related areas. They will contribute to advancements in robotics research by identifying potential interaction opportunities. Furthermore, the model’s practical applications extend to creating realistic media, including images and videos. Integrating the model into creative software tools could enhance image editing functionalities, supporting artists and media creators. Moreover, the model may very well be incorporated into photo editing smartphone applications, enabling users to simply and realistically insert individuals into their photographs.
The researchers have identified several avenues for future exploration. They aim to include greater controllability into generated poses and explore the generation of realistic human movements inside scenes relatively than static images. Moreover, they seek to enhance model efficiency and expand the approach beyond humans to encompass all objects.
In conclusion, the researchers’ introduction of a brand new model allows for the realistic insertion of humans into scenes. Leveraging generative models and self-supervised training, the model demonstrates impressive performance in affording perception and holds potential for various applications within the creative industries and robotics research. Future research will deal with refining and expanding the capabilities of the model.
Check Out The Paper. Don’t forget to affix our 22k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more. If you’ve got any questions regarding the above article or if we missed anything, be happy to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
Niharika
” data-medium-file=”https://www.marktechpost.com/wp-content/uploads/2023/01/1674480782181-Niharika-Singh-264×300.jpg” data-large-file=”https://www.marktechpost.com/wp-content/uploads/2023/01/1674480782181-Niharika-Singh-902×1024.jpg”>
Niharika is a Technical consulting intern at Marktechpost. She is a 3rd yr undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the most recent developments in these fields.