Home Community Learn How one can Generate 3D Avatars from 2D Image Collections with this Novel AI Technique

Learn How one can Generate 3D Avatars from 2D Image Collections with this Novel AI Technique

Learn How one can Generate 3D Avatars from 2D Image Collections with this Novel AI Technique

Generative models, similar to Generative Adversarial Networks (GANs), have the capability to generate lifelike images of objects and dressed individuals after being trained on an in depth image collection. Although the resulting output is a 2D image, quite a few applications necessitate diverse and high-quality virtual 3D avatars. These avatars should allow pose and camera viewpoint control while ensuring 3D consistency. To deal with the demand for 3D avatars, the research community explores generative models able to mechanically generating 3D shapes of humans and clothing based on input parameters like body pose and shape. Despite considerable advancements, most existing methods overlook texture and depend on precise and clean 3D scans of humans for training. Acquiring such scans is dear, limiting their availability and variety.

Developing a way for learning the generation of 3D human shapes and textures from unstructured image data presents a difficult and under-constrained problem. Each training instance exhibits unique shapes and appearances, observed just once from specific viewpoints and poses. While recent progress in 3D-aware GANs has shown impressive results for rigid objects, these methods face difficulties in generating realistic humans resulting from the complexity of human articulation. Although some recent work demonstrates the feasibility of learning articulated humans, existing approaches struggle with limited quality, resolution, and challenges in modeling loose clothing.

The paper reported in this text introduces a novel method for 3D human generation from 2D image collections, achieving state-of-the-art image and geometry quality while effectively modeling loose clothing.

The overview of the proposed method is illustrated below.

This method adopts a monolithic design able to modeling each the human body and loose clothing, departing from the approach of representing humans with separate body parts. Multiple discriminators are incorporated to reinforce geometric detail and deal with perceptually vital regions.

A novel generator design is proposed to handle the goal of high image quality and versatile handling of loose clothing, modeling 3D humans holistically in a canonical space. The articulation module, Fast-SNARF, is accountable for the movement and positioning of body parts and adapted to the generative setting. Moreover, the model adopts empty-space skipping, optimizing and accelerating the rendering of areas with no significant content to enhance overall efficiency.

The modular 2D discriminators are guided by normal information, meaning they consider the directionality of surfaces within the 3D space. This guidance helps the model deal with regions which can be perceptually vital for human observers, contributing to a more accurate and visually pleasing end result. Moreover, the discriminators prioritize geometric details, enhancing the general quality of the generated images. This improvement likely contributes to a more realistic and visually appealing representation of the 3D human models.

The experimental results reported above show a big improvement of the proposed method over previous 3D- and articulation-aware methods when it comes to geometry and texture quality, validated quantitatively, qualitatively, and thru perceptual studies.

In summary, this contribution features a generative model of articulated 3D humans with state-of-the-art appearance and geometry, an efficient generator for loose clothing, and specialized discriminators enhancing visual and geometric fidelity. The authors plan to release the code and models for further exploration.

Take a look at the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to hitch our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

If you happen to like our work, you’ll love our newsletter..

Daniele Lorenzi received his M.Sc. in ICT for Web and Multimedia Engineering in 2021 from the University of Padua, Italy. He’s a Ph.D. candidate on the Institute of Information Technology (ITEC) on the Alpen-Adria-Universität (AAU) Klagenfurt. He’s currently working within the Christian Doppler Laboratory ATHENA and his research interests include adaptive video streaming, immersive media, machine learning, and QoS/QoE evaluation.

↗ Step by Step Tutorial on ‘How one can Construct LLM Apps that may See Hear Speak’


Please enter your comment!
Please enter your name here