Home Community This AI Research Proposes TeCH to Reconstruct a Lifelike 3D Clothed Human from a Single Image with Detailed Full-Body Geometry and High-Quality Texture

This AI Research Proposes TeCH to Reconstruct a Lifelike 3D Clothed Human from a Single Image with Detailed Full-Body Geometry and High-Quality Texture

This AI Research Proposes TeCH to Reconstruct a Lifelike 3D Clothed Human from a Single Image with Detailed Full-Body Geometry and High-Quality Texture

High-fidelity For a lot of augmented and virtual reality applications, including gaming, social networking, education, e-commerce, and immersive telepresence, 3D digital individuals are essential. Many methods think about reconstructing a 3D clothed human figure from a single photograph to make it easier to create digital humans from available in-the-wild photos. Nevertheless, the absence of observations of non-visible locations makes this problem seem poorly posed despite the advances achieved by earlier techniques. It has did not forecast invisible parts (just like the backside) using obvious visual cues (reminiscent of colours and normal estimations), which has led to hazy texture and smoothed-out geometry. In consequence, while taking a look at these reconstructions from various perspectives, discrepancies appear. Multi-view supervision is a viable answer to this problem. But is it possible with only one image as an input? Here, they suggest TeCH as a possible solution. Tech blends textual information acquired from the input picture with a customized Text-to-picture diffusion model, i.e., DreamBooth, to guide the reconstruction process, in contrast to past research that primarily studies the connection between apparent frontal signals and non-visual areas. 

They specifically separate the semantic information from the one input image into the distinctive and finely detailed look of the subject, which is difficult for words to explain appropriately: 

1) Using a garment parsing model (i.e., SegFormer) and a pre-trained visual-language VQA model (i.e., BLIP), explicit parsing of descriptive semantic prompts from the input image is performed. These prompts include specific descriptions of colours, clothing styles, haircuts, and facial traits.

2) A customized Text-to-Image (T2I) diffusion model embeds indescribable appearance information, which implicitly determines the topic’s distinctive look and fine-grained characteristics, right into a special token “[V]”. They use multi-view Rating Distillation Sampling (SDS), reconstruction losses based on the unique observations, and regularisation obtained from off-the-shelf normal estimators to optimize the 3D human based on these information sources to enhance the fidelity of the reconstructed 3D human models while maintaining their original identity. 

Figure 1 shows how TeCH can create a lifelike, 3D-clad person from a single photograph.

Researchers from Zhejiang University, Max Planck Institute for Intelligent Systems, Mohamed bin Zayed University of Artificial Intelligence, and Peking University suggest a hybrid 3D representation based on DMTet to specific a high-resolution geometry at an inexpensive price. To accurately depict the final type of the body, our hybrid 3D representation combines an explicit tetrahedral grid with implicit RGB and Signed Distance Function (SDF) fields. They first optimize this tetrahedral grid, extract the geometry represented as a mesh, after which optimize the feel in a two-stage optimization procedure. Tech makes it possible to recreate accurate 3D models of clothed individuals with precise full-body geometry and wealthy textures with a unified color scheme and pattern. 

In consequence, it makes it easier for varied downstream applications, including character animation, novel view rendering, and shape & texture manipulation. Tech has proven to be more practical at recreating geometric features in quantitative tests on 3D-clothed human datasets that encompass a wide range of postures (CAPE) and attire (THuman2.0). Tech outperforms SOTA approaches regarding rendering quality, based on qualitative assessments done on real-world photos and perceptual research. The code might be publicly accessible for research purposes.

Take a look at the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to affix our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

For those who like our work, please follow us on Twitter

Aneesh Tickoo is a consulting intern at MarktechPost. He’s currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects geared toward harnessing the ability of machine learning. His research interest is image processing and is captivated with constructing solutions around it. He loves to attach with people and collaborate on interesting projects.

🔥 Use SQL to predict the long run (Sponsored)


Please enter your comment!
Please enter your name here