Home Community S-Lab and NTU Researchers Propose Scenimefy: A Novel Semi-Supervised Image-to-Image Translation Framework that Bridges the Gap in Automatic High-Quality Anime Scene Rendering from Real-World Images

S-Lab and NTU Researchers Propose Scenimefy: A Novel Semi-Supervised Image-to-Image Translation Framework that Bridges the Gap in Automatic High-Quality Anime Scene Rendering from Real-World Images

0
S-Lab and NTU Researchers Propose Scenimefy: A Novel Semi-Supervised Image-to-Image Translation Framework that Bridges the Gap in Automatic High-Quality Anime Scene Rendering from Real-World Images

Anime sceneries need an awesome deal of creative talent and time to create. Hence, the event of learning-based methods for automatic scene stylization has undeniable practical and economic significance. Automatic stylization has significantly improved resulting from recent developments in Generative Adversarial Networks (GANs), yet most of this research has mostly focused on human faces. The technique of creating high-quality anime sceneries from intricate real-world scene photos still must be studied despite its tremendous research price. On account of several elements, converting real-world scene photographs into anime styles takes a number of work. 

1) The scene’s composition: Figure 1 illustrates this hierarchy between foreground and background parts in scenes, that are often made up of several items connected in complicated ways. 

2) Characteristics of anime: Figure 1 shows how pre-designed brush strokes are employed in natural settings like grass, trees, and clouds to create distinctive textures and precise details that outline anime. These textures’ organic and hand-drawn nature makes them considerably more difficult to mimic than the crisp edges and uniform color patches outlined in earlier experiments. 

3) The information shortage and domain gap: A high-quality anime scene dataset is crucial in bridging the gap between real and anime scenes, which has a major domain difference. Existing datasets are low quality due to large variety of human faces and other foreground items which have a distinct aesthetic from the background landscape. 

Figure 1: Anime scene characteristics. The presence of hand-drawn brush strokes of grass and stones (foreground), in addition to trees and clouds (background), as opposed to wash edges and flat surfaces, may be seen in a scene frame from Shinkai’s 2011 film “Children Who Chase Lost Voices.”

Unsupervised image-to-image translation is a preferred method for sophisticated scene stylization without paired training data. Existing techniques that focus on anime styles have to catch up in several areas despite showing promising outcomes. First, the shortage of pixel-wise correlation in complex sceneries makes it difficult for present approaches to execute obvious texture stylization while maintaining semantic meaning, potentially resulting in outputs which can be out of the extraordinary and include noticeable artifacts. Second, certain methods don’t produce the fragile details of anime scenes. Their constructed anime-specific losses or pre-extracted representations, which implement edge and surface smoothness, are accountable for this. 

To resolve the abovementioned issues, researchers from S-Lab, Nanyang Technological University propose Scenimefy, a novel semi-supervised image-to-image (I2I) translation pipeline for creating high-quality anime-style representations of scene pictures. Figure 2. Their major suggestion is to make use of produced pseudo-paired data to introduce a brand new supervised training branch into the unsupervised framework to deal with the shortcomings of unsupervised training. They use StyleGAN’s advantageous traits by fine-tuning it to supply coarse paired data between real and anime or faux-paired data. 

Figure 2 shows renderings of anime scenes by Scenimefy. Top row: translated pictures; bottom row: outcomes of the interpretation.

They supply a brand-new semantic-constrained fine-tuning approach that uses wealthy pretrained model priors like CLIP and VGG to direct StyleGAN in capturing intricate scene details and reducing overfitting. To filter low-quality data, in addition they offer a segmentation-guided data selection technique. Using the pseudo-paired data and a novel patch-wise contrastive style loss, Scenimefy creates high quality details between the 2 domains and learns effective pixel-wise correspondence. Their semi-supervised framework attempts a desirable trade-off between the faithfulness and fidelity of scene stylization and the unsupervised training branch. 

Additionally they gathered a high-quality dataset of pure anime scenes to assist training. They carried out extensive tests showing Scenimefy’s efficacy, surpassing industry benchmarks for perceptual quality and quantitative evaluation. The next is an summary of their major contributions: 

• They supply a brand-new, semi-supervised scene stylization framework that transforms actual photographs into sophisticated anime scene images of wonderful quality. Their system adds a novel patchwise contrastive style loss to reinforce stylization and high quality details. 

• A newly developed semantic-constrained StyleGAN fine-tuning technique with wealthy pre-trained prior guidance, followed by a segmentation-guided data selection scheme, produces structure-consistent pseudo-paired data that serves as the idea for the training supervision. 

• They gathered a high-resolution collection of anime scenes to assist future studies on scene stylization.


Take a look at the Paper, Project, and Github link. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to hitch our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more.

If you happen to like our work, you’ll love our newsletter..


Aneesh Tickoo is a consulting intern at MarktechPost. He’s currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects geared toward harnessing the facility of machine learning. His research interest is image processing and is keen about constructing solutions around it. He loves to attach with people and collaborate on interesting projects.


🚀 CodiumAI enables busy developers to generate meaningful tests (Sponsored)

LEAVE A REPLY

Please enter your comment!
Please enter your name here