Home Artificial Intelligence Cook your First U-Net in PyTorch How does it work? Our cooking recipe! That it’s! Final Thoughts

Cook your First U-Net in PyTorch How does it work? Our cooking recipe! That it’s! Final Thoughts

0
Cook your First U-Net in PyTorch
How does it work?
Our cooking recipe!
That it’s!
Final Thoughts

A magic recipe to empower your image segmentation projects

Towards Data Science
Photo by Stefan C. Asafti on Unsplash

U-Net is a deep learning architecture used for semantic segmentation tasks in image evaluation. It was introduced by Olaf Ronneberger, Philipp Fischer, and Thomas Brox in a paper titled “U-Net: Convolutional Networks for Biomedical Image Segmentation”.

It is especially effective for biomedical image segmentation tasks because it could possibly handle images of arbitrary size and produces smooth, high-quality segmentation masks with sharp object boundaries. It has since been widely adopted in lots of other image segmentation tasks, equivalent to in satellite and aerial imagery evaluation, in addition to in natural image segmentation.

On this tutorial, we’ll learn more about U-Net, how it really works, and cook our own recipe -implementation- using PyTorch. So, let’s go!

The U-Net architecture consists of two parts: an encoder and a decoder.

U-Net: Convolutional Networks for Biomedical Image Segmentation

Encoder(Contraction Path)

The encoder is a series of convolutional and pooling layers that progressively downsample the input image to extract features at multiple scales.

Within the Encoder, the scale of the image is progressively reduced while the depth progressively increases. This mainly means the network learns the “WHAT” information within the image, nonetheless, it has lost the “WHERE” information.

Decoder(Expansion Path)

The decoder consists of a series of convolutional and upsampling layers that upsample the feature maps to the unique input image size while also incorporating the high-resolution features from the encoder. This enables the decoder to supply segmentation masks which have the identical size as the unique input image.

You’ll be able to learn more concerning the upsampling and the transposed convolution from this great article.

Within the Decoder, the scale of the image progressively increases while the depth progressively decreases. This mainly means the network learns the “WHERE” information within the image, by progressively applying up-sampling.

Final Layer

At the ultimate layer, a 1×1 convolution is used to map each 64-component feature vector to the specified variety of classes.

We’ll do a really straightforward implementation, it is going to be good to place the above image in front of you while coding.

Imports

First, the essential modules are imported from the torch and torchvision packages, including the nn module for constructing neural networks and the pre-trained models provided in torchvision.models. The relu function can also be imported from torch.nn.functional.

import torch
import torch.nn as nn
from torchvision import models
from torch.nn.functional import relu

UNet Class

Then, a custom class UNet is defined as a subclass of nn.Module. The __init__ method initializes the architecture of the U-Net by defining the layers for each the encoder and decoder parts of the network. The argument n_class specifies the variety of classes for the segmentation task.

class UNet(nn.Module):
def __init__(self, n_class):
super().__init__()

# Encoder
# Within the encoder, convolutional layers with the Conv2d function are used to extract features from the input image.
# Each block within the encoder consists of two convolutional layers followed by a max-pooling layer, excluding the last block which doesn't include a max-pooling layer.
# -------
# input: 572x572x3
self.e11 = nn.Conv2d(3, 64, kernel_size=3, padding=1) # output: 570x570x64
self.e12 = nn.Conv2d(64, 64, kernel_size=3, padding=1) # output: 568x568x64
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2) # output: 284x284x64

# input: 284x284x64
self.e21 = nn.Conv2d(64, 128, kernel_size=3, padding=1) # output: 282x282x128
self.e22 = nn.Conv2d(128, 128, kernel_size=3, padding=1) # output: 280x280x128
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2) # output: 140x140x128

# input: 140x140x128
self.e31 = nn.Conv2d(128, 256, kernel_size=3, padding=1) # output: 138x138x256
self.e32 = nn.Conv2d(256, 256, kernel_size=3, padding=1) # output: 136x136x256
self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2) # output: 68x68x256

# input: 68x68x256
self.e41 = nn.Conv2d(256, 512, kernel_size=3, padding=1) # output: 66x66x512
self.e42 = nn.Conv2d(512, 512, kernel_size=3, padding=1) # output: 64x64x512
self.pool4 = nn.MaxPool2d(kernel_size=2, stride=2) # output: 32x32x512

# input: 32x32x512
self.e51 = nn.Conv2d(512, 1024, kernel_size=3, padding=1) # output: 30x30x1024
self.e52 = nn.Conv2d(1024, 1024, kernel_size=3, padding=1) # output: 28x28x1024

# Decoder
self.upconv1 = nn.ConvTranspose2d(1024, 512, kernel_size=2, stride=2)
self.d11 = nn.Conv2d(1024, 512, kernel_size=3, padding=1)
self.d12 = nn.Conv2d(512, 512, kernel_size=3, padding=1)

self.upconv2 = nn.ConvTranspose2d(512, 256, kernel_size=2, stride=2)
self.d21 = nn.Conv2d(512, 256, kernel_size=3, padding=1)
self.d22 = nn.Conv2d(256, 256, kernel_size=3, padding=1)

self.upconv3 = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2)
self.d31 = nn.Conv2d(256, 128, kernel_size=3, padding=1)
self.d32 = nn.Conv2d(128, 128, kernel_size=3, padding=1)

self.upconv4 = nn.ConvTranspose2d(128, 64, kernel_size=2, stride=2)
self.d41 = nn.Conv2d(128, 64, kernel_size=3, padding=1)
self.d42 = nn.Conv2d(64, 64, kernel_size=3, padding=1)

# Output layer
self.outconv = nn.Conv2d(64, n_class, kernel_size=1)

Within the U-Net paper they used 0 padding and applied post-processing teachiques to revive the unique size of the image, nonetheless here, we uses 1 padding in order that final feature map shouldn’t be cropped and to eliminate any have to apply post-processing to our output image.

Forward Method

The forward method specifies how the input is processed through the network. The input image is first passed through the encoder layers to extract the features. Then, the decoder layers are used to upsample the features to the unique image size while concatenating the corresponding encoder feature maps. Finally, the output layer uses a 1×1 convolutional layer to map the features to the specified variety of output classes.

    def forward(self, x):
# Encoder
xe11 = relu(self.e11(x))
xe12 = relu(self.e12(xe11))
xp1 = self.pool1(xe12)

xe21 = relu(self.e21(xp1))
xe22 = relu(self.e22(xe21))
xp2 = self.pool2(xe22)

xe31 = relu(self.e31(xp2))
xe32 = relu(self.e32(xe31))
xp3 = self.pool3(xe32)

xe41 = relu(self.e41(xp3))
xe42 = relu(self.e42(xe41))
xp4 = self.pool4(xe42)

xe51 = relu(self.e51(xp4))
xe52 = relu(self.e52(xe51))

# Decoder
xu1 = self.upconv1(xe52)
xu11 = torch.cat([xu1, xe42], dim=1)
xd11 = relu(self.d11(xu11))
xd12 = relu(self.d12(xd11))

xu2 = self.upconv2(xd12)
xu22 = torch.cat([xu2, xe32], dim=1)
xd21 = relu(self.d21(xu22))
xd22 = relu(self.d22(xd21))

xu3 = self.upconv3(xd22)
xu33 = torch.cat([xu3, xe22], dim=1)
xd31 = relu(self.d31(xu33))
xd32 = relu(self.d32(xd31))

xu4 = self.upconv4(xd32)
xu44 = torch.cat([xu4, xe12], dim=1)
xd41 = relu(self.d41(xu44))
xd42 = relu(self.d42(xd41))

# Output layer
out = self.outconv(xd42)

return out

Congratulations on successfully implementing your first U-Net model in PyTorch! By following this recipe, you could have gained the knowledge to implement U-Net and may now apply it to any image segmentation problem chances are you’ll encounter in the longer term. Nevertheless, verifying the sizes and channel numbers is essential to make sure compatibility. The U-Net architecture is a robust tool in your arsenal that might be applied to numerous tasks, including medical imaging and autonomous driving. So, go ahead and grab any image segmentation dataset from the web and begin testing your code!

For convenience, I even have added a straightforward test script on this repository.

The script generates random images and masks and trains the U-net model to segment the pictures. It has a function called generate_random_data() that creates input images and their corresponding masks with geometric shapes like triangles, circles, squares, and crosses. The U-net model is trained using these random images and masks. The trained model is then tested on latest random images and the segmentation results are plotted using the plot_img_array() function. The script uses PyTorch to coach the U-net model and likewise uses various functions so as to add shapes to the input images and masks.

consider downloading it and running the tests using this snippet:

import test
test.run(UNet)
Expected Test Output(By me).

In conclusion, the U-Net architecture has turn into incredibly popular in the pc vision community on account of its effectiveness in solving various image segmentation tasks. Its unique design, which incorporates a contracting path followed by an expanding path, allows it to capture each local and global features of a picture while preserving spatial information.

Furthermore, the flexibleness of the U-Net architecture makes it possible to switch and improve the network to suit specific needs. Researchers have proposed various modifications to the unique U-Net architecture, including changing the convolutional layers, incorporating attention mechanisms, and adding skip connections, amongst others. These modifications have resulted in improved performance and higher segmentation ends in various applications.

Overall, the U-Net architecture has proven to be a reliable and versatile solution for image segmentation tasks. As computer vision continues to advance, it’s likely that we’ll see further innovations and modifications to the U-Net architecture to enhance its performance and make it even simpler in solving real-world problems.

Don’t hesitate to share your thoughts with me!

LEAVE A REPLY

Please enter your comment!
Please enter your name here