Home Artificial Intelligence Top 10 Pre-Trained Models for Image Embedding every Data Scientist Should Know 1) VGG: 2) Xception: 3) ResNet: 4) Inception: 5) InceptionResNet: 6) MobileNet: 7) DenseNet: 8) NasNet: 9) EfficientNet: 10) ConvNeXt: Summary: References:

Top 10 Pre-Trained Models for Image Embedding every Data Scientist Should Know 1) VGG: 2) Xception: 3) ResNet: 4) Inception: 5) InceptionResNet: 6) MobileNet: 7) DenseNet: 8) NasNet: 9) EfficientNet: 10) ConvNeXt: Summary: References:

0
Top 10 Pre-Trained Models for Image Embedding every Data Scientist Should Know
1) VGG:
2) Xception:
3) ResNet:
4) Inception:
5) InceptionResNet:
6) MobileNet:
7) DenseNet:
8) NasNet:
9) EfficientNet:
10) ConvNeXt:
Summary:
References:

Image by Chen from Pixabay

The rapid developments in Computer Vision — image classification use cases have been further accelerated by the appearance of transfer learning. It takes a whole lot of computational resources and time to coach a pc vision neural network model on a big dataset of images.

Luckily, this time and resources will be shortened through the use of pre-trained models. The strategy of leveraging feature representation from a pre-trained model known as transfer learning. The pre-trained are generally trained using high-end computational resources and on massive datasets.

The pre-trained models will be utilized in various ways:

  • Using the pre-trained weights and directly making predictions on the test data
  • Using the pre-trained weights for initialization and training the model using the custom dataset
  • Using only the architecture of the pre-trained network, and training it from scratch on the custom dataset

This text walks through the highest 10 state-of-the-art pre-trained models to get image embedding. All these pre-trained models will be loaded as keras models using the keras.application API.

CNN Architecture discussed in this text:
1) VGG
2) Xception
3) ResNet
4) InceptionV3
5) InceptionResNet
6) MobileNet
7) DenseNet
8) NasNet
9) EfficientNet
10) ConvNEXT

The VGG-16/19 networks were introduced on the ILSVRC 2014 conference because it is some of the popular pre-trained models. It was developed by the Visual Graphics Group on the University of Oxford.

There are two variations of the VGG model: 16 and 19 layers network, VGG-19 (19-layer network) being an improvement of the VGG-16 (16-layer network) model.

Architecture:

(Source), VGG-16 Network architecture

The VGG network is easy and sequential in nature and uses a whole lot of filters. At each stage, small (3*3) filters are used to scale back the variety of parameters.

The VGG-16 network has the next:

  • Convolutional Layers = 13
  • Pooling Layers = 5
  • Fully Connected Dense Layers = 3

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for VGG-16/19:

  • Paper Link: https://arxiv.org/pdf/1409.1556.pdf
  • GitHub: VGG
  • Published On: April 2015
  • Performance on ImageNet Dataset: 71% (Top 1 Accuracy), 90% (Top 5 Accuracy)
  • Variety of Parameters: ~140M
  • Variety of Layers: 16/19
  • Size on Disk: ~530MB

Implementation:

tf.keras.applications.VGG16(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for VGG-16 implementation, keras offers an identical API for VGG-19 implementation, for more details seek advice from this documentation.

Xception is a deep CNN architecture that involves depthwise separable convolutions. A depthwise separable convolution will be understood as an Inception model with a maximally large variety of towers.

Architecture:

(Source), Xception architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for Xception:

  • Paper Link: https://arxiv.org/pdf/1409.1556.pdf
  • GitHub: Xception
  • Published On: April 2017
  • Performance on ImageNet Dataset: 79% (Top 1 Accuracy), 94.5% (Top 5 Accuracy)
  • Variety of Parameters: ~30M
  • Depth: 81
  • Size on Disk: 88MB

Implementation:

  • Instantiate the Xception model using the below-mentioned code:
tf.keras.applications.Xception(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for Xception implementation, for more details seek advice from this documentation.

The previous CNN architectures weren’t designed to scale to many convolutional layers. It resulted in a vanishing gradient problem and limited performance upon adding recent layers to the present architecture.

ResNets architecture offers to skip connections to resolve the vanishing gradient problem.

Architecture:

(Source), ResNet architecture

This ResNet model uses a 34-layer network architecture inspired by the VGG-19 model to which the shortcut connections are added. These shortcut connections then convert the architecture right into a residual network.

There are several versions of ResNet architecture:

  • ResNet50
  • ResNet50V2
  • ResNet101
  • ResNet101V2
  • ResNet152
  • ResNet152V2

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for ResNet models:

  • Paper Link: https://arxiv.org/pdf/1512.03385.pdf
  • GitHub: ResNet
  • Published On: Dec 2015
  • Performance on ImageNet Dataset: 75–78% (Top 1 Accuracy), 92–93% (Top 5 Accuracy)
  • Variety of Parameters: 25–60M
  • Depth: 107–307
  • Size on Disk: ~100–230MB

Implementation:

  • Instantiate the ResNet50 model using the below-mentioned code:
tf.keras.applications.ResNet50(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
**kwargs
)

The above-mentioned code is for ResNet50 implementation, keras offers an identical API to other ResNet architecture implementations, for more details seek advice from this documentation.

Multiple deep layers of convolutions resulted within the overfitting of the information. To avoid overfitting, the inception model uses parallel layers or multiple filters of various sizes on the identical level, to make the model wider somewhat than making it deeper. The Inception V1 model is product of 4 parallel layers with: (1*1), (3*3), (5*5) convolutions, and (3*3) max pooling.

Inception (V1/V2/V3) is deep learning model-based CNN network developed by a team at Google. InceptionV3 is a sophisticated and optimized version of the InceptionV1 and V2 models.

Architecture:

The InceptionV3 model is made up of 42 layers. The architecture of InceptionV3 is progressively step-by-step built as:

  • Factorized Convolutions
  • Smaller Convolutions
  • Asymmetric Convolutions
  • Auxilliary Convolutions
  • Grid Size Reduction

All these concepts are consolidated into the ultimate architecture mentioned below:

(Source), InceptionV3 architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for InceptionV3 models:

Implementation:

  • Instantiate the InceptionV3 model using the below-mentioned code:
tf.keras.applications.InceptionV3(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for InceptionV3 implementation, for more details seek advice from this documentation.

InceptionResNet-v2 is a CNN model developed by researchers at Google. The goal of this model was to scale back the complexity of InceptionV3 and explore the opportunity of using residual networks on the Inception model.

Architecture:

(Source), Inception-ResNet-V2 architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for Inception-ResNet-V2 models:

Implementation:

  • Instantiate the Inception-ResNet-V2 model using the below-mentioned code:
tf.keras.applications.InceptionResNetV2(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)

The above-mentioned code is for Inception-ResNet-V2 implementation, for more details seek advice from this documentation.

MobileNet is a streamlined architecture that uses depthwise separable convolutions to construct deep convolutional neural networks and provides an efficient model for mobile and embedded vision applications.

Architecture:

(Source), Mobile-Net architecture

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for MobileNet models:

Implementation:

  • Instantiate the MobileNet model using the below-mentioned code:
tf.keras.applications.MobileNet(
input_shape=None,
alpha=1.0,
depth_multiplier=1,
dropout=0.001,
include_top=True,
weights="imagenet",
input_tensor=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)

The above-mentioned code is for MobileNet implementation, keras offers an identical API to other MobileNet architecture (MobileNet-V2, MobileNet-V3) implementation, for more details seek advice from this documentation.

DenseNet is a CNN model developed to enhance accuracy attributable to the vanishing gradient in high-level neural networks on account of the long distance between input and output layers and the data vanishes before reaching the destination.

Architecture:

A DenseNet architecture has 3 dense blocks. The layers between two adjoining blocks are known as transition layers and alter feature-map sizes via convolution and pooling.

(Source), DenseNet architecture

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for DenseNet models:

Implementation:

  • Instantiate the DenseNet121 model using the below-mentioned code:
tf.keras.applications.DenseNet121(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for DenseNet implementation, keras offers an identical API to other DenseNet architecture (DenseNet-169, DenseNet-201) implementation, for more details seek advice from this documentation.

Google researchers designed a NasNet model that framed the issue to seek out the perfect CNN architecture as a Reinforcement Learning approach. The thought is to go looking for the perfect combination of parameters of the given search space of quite a few layers, filter sizes, strides, output channels, etc.

Input: Image of dimensions (331, 331, 3)

Other Details for NasNet models:

  • Paper Link: https://arxiv.org/pdf/1608.06993.pdf
  • Published On: Apr 2018
  • Performance on ImageNet Dataset: 75–83% (Top 1 Accuracy), 92–96% (Top 5 Accuracy)
  • Variety of Parameters: 5–90M
  • Depth: 389–533
  • Size on Disk: 23–343MB

Implementation:

  • Instantiate the NesNetLarge model using the below-mentioned code:
tf.keras.applications.NASNetLarge(
input_shape=None,
include_top=True,
weights="imagenet",
input_tensor=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for NesNet implementation, keras offers an identical API to other NasNet architecture (NasNetLarge, NasNetMobile) implementation, for more details seek advice from this documentation.

EfficientNet is a CNN architecture from the researchers of Google, that may achieve higher performance by a scaling method called compound scaling. This scaling method uniformly scales all dimensions of depth/width/resolution by a hard and fast amount (compound coefficient) uniformly.

Architecture:

(Source), Efficient-B0 architecture

Other Details for EfficientNet Models:

Implementation:

  • Instantiate the EfficientNet-B0 model using the below-mentioned code:
tf.keras.applications.EfficientNetB0(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)

The above-mentioned code is for EfficientNet-B0 implementation, keras offers an identical API for other EfficientNet architecture (EfficientNet-B0 to B7, EfficientNet-V2-B0 to B3) implementation, for more details seek advice from this documentation, and this documentation.

The ConvNeXt CNN model was proposed as a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

Architecture:

(Source), ConvNeXt architecture

Other Details for ConvNeXt models:

Implementation:

  • Instantiate the ConvNeXt-Tiny model using the below-mentioned code:
tf.keras.applications.ConvNeXtTiny(
model_name="convnext_tiny",
include_top=True,
include_preprocessing=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)

The above-mentioned code is for ConvNeXt-Tiny implementation, keras offers an identical API of the opposite EfficientNet architecture (ConvNeXt-Small, ConvNeXt-Base, ConvNeXt-Large, ConvNeXt-XLarge) implementation, for more details seek advice from this documentation.

LEAVE A REPLY

Please enter your comment!
Please enter your name here