Generative AI. That is term within the AI domain recently. Everyone seems to be talking about it, and it keeps getting an increasing number of impressive. With each passing day, the capabilities of AI models in generating realistic and high-quality content proceed to impress. For instance, we’ve got seen AI models that may generate photorealistic 2D images from basic text prompts, and that is just the tip of the iceberg. Because the generative AI trend continues to realize momentum, we’re seeing much more impressive results, akin to AI models that may generate videos, texts, and even music with incredible realism.
Certainly one of the main breakthroughs on this area has been the event of diffusion models, which have enabled AI models to supply realistic outputs that were previously thought unattainable. We cannot forget concerning the development of large-scale datasets that were an important a part of the success of diffusion models.
The extent of quality in 2D generation has reached some extent where it’s becoming difficult to distinguish AI-generated models from real ones. Nevertheless, after we add the third dimension and move to 3D generation, the identical can’t be said, unfortunately. 3D generative models are still inferior in comparison with their 2D counterparts.
3D modeling is a significantly larger output space because it requires way more work to be done. Ensuring consistency in 3D is a particularly difficult task, and on top of that, the dearth of a large-scale text-to-3D model dataset makes training a generative model simply not feasible. Subsequently, existing attempts focused on going around the necessities by deforming template shapes using a CLIP objective, however the resulting 3D shapes were unsatisfactory in geometry and appearance.
Then, there got here the DreamFusion. It uses text-to-image diffusion models to supervise 3D modeling from text prompts. Nevertheless, this method tended to supply over-saturated colours and represented the 3D scene in the shape of a Neural Radiance Field (NeRF), which is impractical for normal computer graphics pipelines.
So, what might be done to resolve these problems? How can we’ve got an AI model that may generate realistic 3D meshes? The reply is TextMesh.
TextMesh is a novel method for 3D shape generation from text prompts that generates photorealistic 3D content in the shape of ordinary 3D meshes. TextMesh modifies DreamFusion to model radiance in the shape of a signed distance function (SDF), allowing for straightforward extraction of the surface because the 0-level set of the obtained volume. Moreover, TextMesh retextures the output by leveraging one other diffusion model, which is conditioned on color and depth from the mesh.
TextMesh also proposes a novel multi-view consistent and mesh-conditioned re-texturing method that allows the generation of photorealistic 3D mesh models. The refined texture is trained on several views concurrently through the diffusion model to make sure smooth transitions.
Overall, TextMesh modifies DreamFusion to model radiance in the shape of SDF to tailor the model toward mesh extraction. It proposes a novel multi-view consistent and mesh-conditioned re-texturing method. TextMesh can generate 3D meshes which can be significantly improved upon previous methods for realism and might be directly utilized inside standard computer graphics pipelines and applications in AR or VR.
Take a look at the Project. Don’t forget to affix our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more. If you might have any questions regarding the above article or if we missed anything, be at liberty to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He’s currently pursuing a Ph.D. degree on the University of Klagenfurt, Austria, and dealing as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.