We’re all amazed by the generative AI advancements recently, but that doesn’t mean we don’t get any significant breakthroughs in other applications. For instance, the pc vision domain has been seeing relatively rapid advancements recently as well. The Segment Anything Model (SAM) release by Meta was an enormous success and adjusted the sport in 2D image segmentation entirely.
In image segmentation, the goal is to detect and type of “paint” all of the objects within the scene. Often, this is completed by training a model on a dataset of objects we wish to segmentize. Then, we will use the model to segment the very objects in several images. Nonetheless, the fundamental problem here is that the model is bounded by the objects we show it throughout the training; and it cannot segmentize unseen objects.
With SAM, this is modified. SAM is the primary model that would segmentize , literally. That is achieved by training the SAM on large-scale data and giving it the flexibility to perform zero-shot segmentation across various sorts of image data. It’s designed to mechanically segment objects of interest in images, no matter their shape, size, or appearance. SAM has demonstrated remarkable performance in segmenting objects in 2D images, revolutionizing the sphere of computer vision.
After all, people didn’t simply stop there. They began working on ways to increase SAM’s capabilities beyond 2D. Nonetheless, a key query has remained unanswered: Can SAM’s segmentation ability be prolonged to 3D, thereby bridging the gap between 2D and 3D perception attributable to data scarcity? The reply is looking like yes, and it’s time to fulfill with SA3D.
SA3D leverages advancements in Neural Radiance Fields (NeRF) and the SAM model to revolutionize 3D segmentation. NeRF has emerged as one of the vital popular 3D representations in recent times. NeRF builds connections between sparse 2D images and real 3D points through differentiable volume rendering. It has seen quite a few improvements, making it a strong tool for tackling the challenges of 3D perception.
There have been some attempts to increase NeRF-based techniques for 3D segmentation. These approaches involved training a further feature field aligned with a pre-trained 2D visual backbone. While effective, these methods suffer from limitations similar to high memory footprint, artifacts in radiance fields affecting feature fields, and inefficiency as a result of the necessity for training a further feature field for each scene.
That is where SA3D comes into play. Unlike previous methods, SA3D doesn’t require training a further feature field. As a substitute, it leverages the facility of SAM and NeRF to segment desired objects from all views mechanically.
SA3D works by taking user-specified prompts from a single rendered view to initiate the segmentation process. The segmentation maps generated by SAM are then projected onto 3D mask grids using density-guided inverse rendering, providing initial 3D segmentation results. To refine the segmentation, incomplete 2D masks from other views are rendered and used as cross-view self-prompts. These masks are fed into SAM to generate refined masks, that are then projected onto the 3D mask grids. This iterative process allows for the generation of complete 3D segmentation results.
SA3D offers several benefits over previous approaches. It might probably easily adapt to any pre-trained NeRF model without the necessity for changes or re-training, making it highly compatible and adaptable. Your complete segmentation process with SA3D is efficient, taking roughly two minutes without requiring engineering optimization. This speed makes SA3D a practical solution for real-world applications. Furthermore, experimental results have demonstrated that SA3D can generate fine-grained segmentation results for various kinds of 3D objects, opening up latest possibilities for applications similar to robotics, augmented reality, and virtual reality.
Try the Paper, Project, and Github link. Don’t forget to hitch our 21k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more. If you have got any questions regarding the above article or if we missed anything, be happy to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He’s currently pursuing a Ph.D. degree on the University of Klagenfurt, Austria, and dealing as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.