Home Learn Real-time Face detection | Face Mask Detection using OpenCV

Real-time Face detection | Face Mask Detection using OpenCV

0
Real-time Face detection | Face Mask Detection using OpenCV

In this text, we’re going to seek out out the best way to detect faces in real-time using OpenCV. After detecting the face from the webcam stream, we’re going to avoid wasting the frames containing the face. Later we are going to pass these frames (images) to our mask detector classifier to seek out out if the person is wearing a mask or not.

We’re also going to see the best way to make a custom mask detector using Tensorflow and Keras but you’ll be able to skip that as I might be attaching the trained model file below which you’ll download and use. Here is the list of subtopics we’re going to cover:

  1. What’s Face Detection?
  2. Face Detection Methods
  3. Face detection algorithm
  4. Face recognition
  5. Face Detection using Python
  6. Face Detection using OpenCV
  7. Create a model to recognise faces wearing a mask (Optional)
  8. The way to do Real-time Mask detection 

What is Face Detection?

The goal of face detection is to find out if there are any faces within the image or video. If multiple faces are present, each face is enclosed by a bounding box and thus we all know the placement of the faces

The first objective of face detection algorithms is to accurately and efficiently determine the presence and position of faces in a picture or video. The algorithms analyze the visual content of the information, trying to find patterns and features that correspond to facial characteristics. By employing various techniques, resembling machine learning, image processing, and pattern recognition, face detection algorithms aim to differentiate faces from other objects or background elements inside the visual data.

Human faces are difficult to model as there are numerous variables that may change for instance facial features, orientation, lighting conditions, and partial occlusions resembling sunglasses, scarfs, masks, etc. The results of the detection gives the face location parameters and it might be required in various forms, as an example, a rectangle covering the central a part of the face, eye centers or landmarks including eyes, nose and mouth corners, eyebrows, nostrils, etc.

Face Detection Methods

There are two principal approaches for Face Detection:

  1. Feature Base Approach
  2. Image Base Approach

Feature Base Approach

Objects are frequently recognized by their unique features. There are a lot of features in a human face, which could be recognized between a face and lots of other objects. It locates faces by extracting structural features like eyes, nose, mouth etc. after which uses them to detect a face. Typically, some form of statistical classifier qualified then helpful to separate between facial and non-facial regions. As well as, human faces have particular textures which could be used to distinguish between a face and other objects. Furthermore, the sting of features may also help to detect the objects from the face. In the approaching section, we are going to implement a feature-based approach by utilizing the OpenCV tutorial.

Image Base Approach

Usually, Image-based methods depend on techniques from statistical evaluation and machine learning to seek out the relevant characteristics of face and non-face images. The learned characteristics are in the shape of distribution models or discriminant functions that’s consequently used for face detection. On this method, we use different algorithms resembling Neural-networks, HMM, SVM, AdaBoost learning. In the approaching section, we are going to see how we are able to detect faces with MTCNN or Multi-Task Cascaded Convolutional Neural Network, which is an Image-based approach of face detection

Face detection algorithm

One among the favored algorithms that use a feature-based approach is the Viola-Jones algorithm and here I’m briefly going to debate it. If you wish to find out about it intimately, I’d suggest going through this text, Face Detection using Viola Jones Algorithm.

Viola-Jones algorithm is known as after two computer vision researchers who proposed the tactic in 2001, Paul Viola and Michael Jones of their paper, “Rapid Object Detection using a Boosted Cascade of Easy Features”. Despite being an outdated framework, Viola-Jones is sort of powerful, and its application has proven to be exceptionally notable in real-time face detection. This algorithm is painfully slow to coach but can detect faces in real-time with impressive speed.

Given a picture(this algorithm works on grayscale images), the algorithm looks at many smaller subregions and tries to seek out a face by searching for specific features in each subregion. It needs to examine many various positions and scales because a picture can contain many faces of assorted sizes. Viola and Jones used Haar-like features to detect faces on this algorithm.

Face Recognition

Face detection and Face Recognition are sometimes used interchangeably but these are quite different. In actual fact, Face detection is just a part of Face Recognition.

Face recognition is a technique of identifying or verifying the identity of a person using their face. There are numerous algorithms that may do face recognition but their accuracy might vary. Here I’m going to explain how we do face recognition using deep learning.

In actual fact here is an article, Face Recognition Python which shows the best way to implement Face Recognition.

Face Detection using Python

As mentioned before, here we’re going to see how we are able to detect faces by utilizing an Image-based approach. MTCNN or Multi-Task Cascaded Convolutional Neural Network is definitely one of the vital popular and most accurate face detection tools that work this principle. As such, it is predicated on a deep learning architecture, it specifically consists of three neural networks (P-Net, R-Net, and O-Net) connected in a cascade.

So, let’s see how we are able to use this algorithm in Python to detect faces in real-time. First, it’s worthwhile to install MTCNN library which comprises a trained model that may detect faces.

pip install mtcnn

Now allow us to see the best way to use MTCNN:

from mtcnn import MTCNN
import cv2
detector = MTCNN()
#Load a videopip TensorFlow
video_capture = cv2.VideoCapture(0)

while (True):
    ret, frame = video_capture.read()
    frame = cv2.resize(frame, (600, 400))
    boxes = detector.detect_faces(frame)
    if boxes:

        box = boxes[0]['box']
        conf = boxes[0]['confidence']
        x, y, w, h = box[0], box[1], box[2], box[3]

        if conf > 0.5:
            cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 255, 255), 1)

    cv2.imshow("Frame", frame)
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

video_capture.release()
cv2.destroyAllWindows()

Face Detection using OpenCV

On this section, we’re going to perform real-time face detection using OpenCV from a live stream via our webcam.

As you already know videos are mainly made up of frames, that are still images. We perform face detection for every frame in a video. So in relation to detecting a face in a still image and detecting a face in a real-time video stream, there just isn’t much difference between them.

We might be using Haar Cascade algorithm, also referred to as Voila-Jones algorithm to detect faces. It is essentially a machine learning object detection algorithm that’s used to discover objects in a picture or video. In OpenCV, we’ve got several trained  Haar Cascade models that are saved as XML files. As a substitute of making and training the model from scratch, we use this file. We’re going to use “haarcascade_frontalface_alt2.xml” file on this project. Now allow us to start coding this up

Step one is to seek out the trail to the “haarcascade_frontalface_alt2.xml” file. We do that by utilizing the os module of Python language.

import os
cascPath = os.path.dirname(
    cv2.__file__) + "/data/haarcascade_frontalface_alt2.xml"

The following step is to load our classifier. The trail to the above XML file goes as an argument to CascadeClassifier() approach to OpenCV.

faceCascade = cv2.CascadeClassifier(cascPath)

After loading the classifier, allow us to open the webcam using this easy OpenCV one-liner code

video_capture = cv2.VideoCapture(0)

Next, we’d like to get the frames from the webcam stream, we do that using the read() function. We use it in infinite loop to get all of the frames until the time we would like to shut the stream.

while True:
    # Capture frame-by-frame
    ret, frame = video_capture.read()

The read() function returns:

  1. The actual video frame read (one frame on each loop)
  2. A return code

The return code tells us if we’ve got run out of frames, which can occur if we’re reading from a file. This doesn’t matter when reading from the webcam since we are able to record perpetually, so we are going to ignore it.

For this specific classifier to work, we’d like to convert the frame into greyscale.

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

The faceCascade object has a technique detectMultiScale(), which receives a frame(image) as an argument and runs the classifier cascade over the image. The term MultiScale indicates that the algorithm looks at subregions of the image in multiple scales, to detect faces of various sizes.

  faces = faceCascade.detectMultiScale(gray,
                                         scaleFactor=1.1,
                                         minNeighbors=5,
                                         minSize=(60, 60),
                                         flags=cv2.CASCADE_SCALE_IMAGE)

Allow us to undergo these arguments of this function:

  • scaleFactor – Parameter specifying how much the image size is reduced at each image scale. By rescaling the input image, you’ll be able to resize a bigger face to a smaller one, making it detectable by the algorithm. 1.05 is an excellent possible value for this, which implies you employ a small step for resizing, i.e. reduce the scale by 5%, you increase the possibility of an identical size with the model for detection is found.
  • minNeighbors – Parameter specifying what number of neighbors each candidate rectangle should should retain it. This parameter will affect the standard of the detected faces. Higher value leads to fewer detections but with higher quality. 3~6 is an excellent value for it.
  • flags –Mode of operation
  • minSize – Minimum possible object size. Objects smaller than which might be ignored.

The variable faces now contain all of the detections for the goal image. Detections are saved as pixel coordinates. Each detection is defined by its top-left corner coordinates and the width and height of the rectangle that encompasses the detected face.

To point out the detected face, we are going to draw a rectangle over it.OpenCV’s rectangle() draws rectangles over images, and it must know the pixel coordinates of the top-left and bottom-right corners. The coordinates indicate the row and column of pixels within the image. We are able to easily get these coordinates from the variable face.

for (x,y,w,h) in faces:
        cv2.rectangle(frame, (x, y), (x + w, y + h),(0,255,0), 2)

rectangle() accepts the next arguments:

  • The unique image
  • The coordinates of the top-left point of the detection
  • The coordinates of the bottom-right point of the detection
  • The color of the rectangle (a tuple that defines the quantity of red, green, and blue (0-255)).In our case, we set as green just keeping the green component as 255 and rest as zero.
  • The thickness of the rectangle lines

Next, we just display the resulting frame and in addition set a method to exit this infinite loop and shut the video feed. By pressing the ‘q’ key, we are able to exit the script here

 cv2.imshow('Video', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

The following two lines are only to wash up and release the image.

video_capture.release()
cv2.destroyAllWindows()

Listed here are the total code and output.

import cv2
import os
cascPath = os.path.dirname(
    cv2.__file__) + "/data/haarcascade_frontalface_alt2.xml"
faceCascade = cv2.CascadeClassifier(cascPath)
video_capture = cv2.VideoCapture(0)
while True:
    # Capture frame-by-frame
    ret, frame = video_capture.read()
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = faceCascade.detectMultiScale(gray,
                                         scaleFactor=1.1,
                                         minNeighbors=5,
                                         minSize=(60, 60),
                                         flags=cv2.CASCADE_SCALE_IMAGE)
    for (x,y,w,h) in faces:
        cv2.rectangle(frame, (x, y), (x + w, y + h),(0,255,0), 2)
        # Display the resulting frame
    cv2.imshow('Video', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
video_capture.release()
cv2.destroyAllWindows()

Output:

Create a model to acknowledge faces wearing a mask

On this section, we’re going to make a classifier that may differentiate between faces with masks and without masks. In case you wish to skip this part, here’s a link to download the pre-trained model. Put it aside and move on to the following section to know the best way to use it to detect masks using OpenCV. Try our collection of OpenCV courses to assist you develop your skills and understand higher.

So for creating this classifier, we’d like data in the shape of Images. Luckily we’ve got a dataset containing images faces with mask and and not using a mask. Since these images are very less in number, we cannot train a neural network from scratch. As a substitute, we finetune a pre-trained network called MobileNetV2 which is trained on the Imagenet dataset.

Allow us to first import all of the essential libraries we’re going to need.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import os

The following step is to read all the pictures and assign them to some list. Here we get all of the paths related to these images after which label them accordingly. Remember our dataset is contained in two folders viz- with_masks and without_masks. So we are able to easily get the labels by extracting the folder name from the trail. Also, we preprocess the image and resize it to 224x 224 dimensions.

imagePaths = list(paths.list_images('/content/drive/My Drive/dataset'))
data = []
labels = []
# loop over the image paths
for imagePath in imagePaths:
	# extract the category label from the filename
	label = imagePath.split(os.path.sep)[-2]
	# load the input image (224x224) and preprocess it
	image = load_img(imagePath, target_size=(224, 224))
	image = img_to_array(image)
	image = preprocess_input(image)
	# update the information and labels lists, respectively
	data.append(image)
	labels.append(label)
# convert the information and labels to NumPy arrays
data = np.array(data, dtype="float32")
labels = np.array(labels)

The following step is to load the pre-trained model and customize it in keeping with our problem. So we just remove the highest layers of this pre-trained model and add few layers of our own. As you’ll be able to see the last layer has two nodes as we’ve got only two outputs. This is named transfer learning.

baseModel = MobileNetV2(weights="imagenet", include_top=False,
	input_shape=(224, 224, 3))
# construct the top of the model that might be placed on top of the
# the bottom model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)

# place the top FC model on top of the bottom model (this may grow to be
# the actual model we are going to train)
model = Model(inputs=baseModel.input, outputs=headModel)
# loop over all layers in the bottom model and freeze them so they are going to
# *not* be updated in the course of the first training process
for layer in baseModel.layers:
	layer.trainable = False

Now we’d like to convert the labels into one-hot encoding. After that, we split the information into training and testing sets to judge them. Also, the following step is data augmentation which significantly increases the variety of information available for training models, without actually collecting recent data. Data augmentation techniques resembling cropping, rotation, shearing and horizontal flipping are commonly used to coach large neural networks.

lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)
# partition the information into training and testing splits using 80% of
# the information for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=0.20, stratify=labels, random_state=42)
# construct the training image generator for data augmentation
aug = ImageDataGenerator(
	rotation_range=20,
	zoom_range=0.15,
	width_shift_range=0.2,
	height_shift_range=0.2,
	shear_range=0.15,
	horizontal_flip=True,
	fill_mode="nearest")

The following step is to compile the model and train it on the augmented data.

INIT_LR = 1e-4
EPOCHS = 20
BS = 32
print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt,
	metrics=["accuracy"])
# train the top of the network
print("[INFO] training head...")
H = model.fit(
	aug.flow(trainX, trainY, batch_size=BS),
	steps_per_epoch=len(trainX) // BS,
	validation_data=(testX, testY),
	validation_steps=len(testX) // BS,
	epochs=EPOCHS)

Now that our model is trained, allow us to plot a graph to see its learning curve. Also, we save the model for later use. Here’s a link to this trained model.

N = EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")

Output:

#To save lots of the trained model
model.save('mask_recog_ver2.h5')

The way to do Real-time Mask detection 

Before moving to the following part, ensure to download the above model from this link and place it in the identical folder because the python script you might be going to put in writing the below code in.

Now that our model is trained, we are able to modify the code in the primary section in order that it will probably detect faces and in addition tell us if the person is wearing a mask or not.

To ensure that our mask detector model to work, it needs images of faces. For this, we are going to detect the frames with faces using the methods as shown in the primary section after which pass them to our model after preprocessing them. So allow us to first import all of the libraries we’d like.

import cv2
import os
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
import numpy as np

The primary few lines are the exact same as the primary section. The one thing that’s different is that we’ve got assigned our pre-trained mask detector model to the variable model.

ascPath = os.path.dirname(
    cv2.__file__) + "/data/haarcascade_frontalface_alt2.xml"
faceCascade = cv2.CascadeClassifier(cascPath)
model = load_model("mask_recog1.h5")

video_capture = cv2.VideoCapture(0)
while True:
    # Capture frame-by-frame
    ret, frame = video_capture.read()
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = faceCascade.detectMultiScale(gray,
                                         scaleFactor=1.1,
                                         minNeighbors=5,
                                         minSize=(60, 60),
                                         flags=cv2.CASCADE_SCALE_IMAGE)

Next, we define some lists. The faces_list comprises all of the faces which might be detected by the faceCascade model and the preds list is used to store the predictions made by the mask detector model.

faces_list=[]
preds=[]

Also because the faces variable comprises the top-left corner coordinates, height and width of the rectangle encompassing the faces, we are able to use that to get a frame of the face after which preprocess that frame in order that it will probably be fed into the model for prediction. The preprocessing steps are same which might be followed when training the model within the second section. For instance, the model is trained on RGB images so we convert the image into RGB here

    for (x, y, w, h) in faces:
        face_frame = frame[y:y+h,x:x+w]
        face_frame = cv2.cvtColor(face_frame, cv2.COLOR_BGR2RGB)
        face_frame = cv2.resize(face_frame, (224, 224))
        face_frame = img_to_array(face_frame)
        face_frame = np.expand_dims(face_frame, axis=0)
        face_frame =  preprocess_input(face_frame)
        faces_list.append(face_frame)
        if len(faces_list)>0:
            preds = model.predict(faces_list)
        for pred in preds:
        #mask contain probabily of wearing a mask and vice versa
            (mask, withoutMask) = pred 

After getting the predictions, we draw a rectangle over the face and put a label in keeping with the predictions.

label = "Mask" if mask > withoutMask else "No Mask"
        color = (0, 255, 0) if label == "Mask" else (0, 0, 255)
        label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)
        cv2.putText(frame, label, (x, y- 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)

        cv2.rectangle(frame, (x, y), (x + w, y + h),color, 2)

The remainder of the steps are the identical as the primary section.

cv2.imshow('Video', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
video_capture.release()
cv2.destroyAllWindows()

Here is the whole code and output:

import cv2
import os
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
import numpy as np

cascPath = os.path.dirname(
    cv2.__file__) + "/data/haarcascade_frontalface_alt2.xml"
faceCascade = cv2.CascadeClassifier(cascPath)
model = load_model("mask_recog1.h5")

video_capture = cv2.VideoCapture(0)
while True:
    # Capture frame-by-frame
    ret, frame = video_capture.read()
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = faceCascade.detectMultiScale(gray,
                                         scaleFactor=1.1,
                                         minNeighbors=5,
                                         minSize=(60, 60),
                                         flags=cv2.CASCADE_SCALE_IMAGE)
    faces_list=[]
    preds=[]
    for (x, y, w, h) in faces:
        face_frame = frame[y:y+h,x:x+w]
        face_frame = cv2.cvtColor(face_frame, cv2.COLOR_BGR2RGB)
        face_frame = cv2.resize(face_frame, (224, 224))
        face_frame = img_to_array(face_frame)
        face_frame = np.expand_dims(face_frame, axis=0)
        face_frame =  preprocess_input(face_frame)
        faces_list.append(face_frame)
        if len(faces_list)>0:
            preds = model.predict(faces_list)
        for pred in preds:
            (mask, withoutMask) = pred
        label = "Mask" if mask > withoutMask else "No Mask"
        color = (0, 255, 0) if label == "Mask" else (0, 0, 255)
        label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)
        cv2.putText(frame, label, (x, y- 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)

        cv2.rectangle(frame, (x, y), (x + w, y + h),color, 2)
        # Display the resulting frame
    cv2.imshow('Video', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
video_capture.release()
cv2.destroyAllWindows()

Output:

This brings us to the top of this text where we learned the best way to detect faces in real-time and in addition designed a model that may detect faces with masks. Using this model we were capable of modify the face detector to mask detector.

Update: I trained one other model which may classify images into wearing a mask, not wearing a mask and never properly wearing a mask. Here’s a link of the Kaggle notebook of this model. You possibly can modify it and in addition download the model from there and use it in as a substitute of the model we trained in this text. Although this model just isn’t as efficient because the model we trained here, it has an additional feature of detecting not properly worn masks.

If you happen to are using this model it’s worthwhile to make some minor changes to the code. Replace the previous lines with these lines.

#Listed here are some minor changes in opencv code
for (box, pred) in zip(locs, preds):
        # unpack the bounding box and predictions
        (startX, startY, endX, endY) = box
        (mask, withoutMask,notproper) = pred

        # determine the category label and color we'll use to attract
        # the bounding box and text
        if (mask > withoutMask and mask>notproper):
            label = "Without Mask"
        elif ( withoutMask > notproper and withoutMask > mask):
            label = "Mask"
        else:
            label = "Wear Mask Properly"

        if label == "Mask":
            color = (0, 255, 0)
        elif label=="Without Mask":
            color = (0, 0, 255)
        else:
            color = (255, 140, 0)

        # include the probability within the label
        label = "{}: {:.2f}%".format(label,
                                     max(mask, withoutMask, notproper) * 100)

        # display the label and bounding box rectangle on the output
        # frame
        cv2.putText(frame, label, (startX, startY - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
        cv2.rectangle(frame, (startX, startY), (endX, endY), color, 2)

You can even upskill with Great Learning’s PGP Artificial Intelligence and Machine Learning Course. The course offers mentorship from industry leaders, and also you can even have the chance to work on real-time industry-relevant projects.

Further Reading

  1. Real-Time Object Detection Using TensorFlow
  2. YOLO object detection using OpenCV
  3. Object Detection in Pytorch | What’s Object Detection?

LEAVE A REPLY

Please enter your comment!
Please enter your name here