Home Artificial Intelligence Face Detection using Viola Jones Algorithm

Face Detection using Viola Jones Algorithm

0
Face Detection using Viola Jones Algorithm

Within the realm of computer vision, face detection stands as a fundamental and fascinating task. Detecting and locating faces inside images or video streams forms the cornerstone of various applications, from facial recognition systems to digital image processing. Amongst the various algorithms developed to tackle this challenge, the Viola-Jones algorithm has emerged as a groundbreaking approach renowned for its speed and accuracy.

The Viola-Jones algorithm, pioneered by Paul Viola and Michael Jones in 2001, revolutionized the sector of face detection. Its efficient and robust methodology opened doors to a wide selection of applications that depend on accurately identifying and analyzing human faces. By harnessing the facility of Haar-like features, integral images, machine learning, and cascades of classifiers, the Viola-Jones algorithm showcases the synergy between computer science and image processing.

On this blog, we’ll delve into the intricacies of the Viola-Jones algorithm, unraveling its underlying mechanisms and exploring its applications. From its training process to its implementation in real-world scenarios, we’ll unlock the facility of face detection and witness firsthand the transformative capabilities of the Viola-Jones algorithm.

Detecting face and eyes
  1. What’s face detection?
  2. What’s Viola Jones algorithm?
    1. What are Haar-Like Features?
    2. What are Integral Images?
    3. How is AdaBoost utilized in viola jones algorithm?
    4. What are Cascading Classifiers?
  3. Using a Viola Jones Classifier to detect faces in a live webcam feed
computer vision

What’s face detection?

Object detection is one in all the pc technologies that’s connected to image processing and computer vision. It is anxious with detecting instances of an object reminiscent of human faces, buildings, trees, cars, etc. The first aim of face detection algorithms is to find out whether there’s any face in a picture or not.

Lately, we now have seen significant advancement of technologies that may detect and recognise faces. Our mobile cameras are sometimes equipped with such technology where we are able to see a box across the faces. Although there are quite advanced face detection algorithms, especially with the introduction of deep learning, the introduction of viola jones algorithm in 2001 was a  breakthrough on this field. Now allow us to explore the viola jones algorithm intimately.

What’s Viola Jones algorithm?

Viola Jones algorithm is called after two computer vision researchers who proposed the tactic in 2001, Paul Viola and Michael Jones of their paper, “Rapid Object Detection using a Boosted Cascade of Easy Features”. Despite being an outdated framework, Viola-Jones is kind of powerful, and its application has proven to be exceptionally notable in real-time face detection. This algorithm is painfully slow to coach but can detect faces in real-time with impressive speed.

Given a picture(this algorithm works on grayscale image), the algorithm looks at many smaller subregions and tries to search out a face by in search of specific features in each subregion. It needs to ascertain many various positions and scales because a picture can contain many faces of assorted sizes. Viola and Jones used Haar-like features to detect faces on this algorithm.

The Viola Jones algorithm has 4 major steps, which we will discuss within the sections to follow:

  1. Choosing Haar-like features
  2. Creating an integral image
  3. Running AdaBoost training
  4. Creating classifier cascades

What are Haar-Like Features?

Within the nineteenth century a Hungarian mathematician, Alfred Haar gave the concepts of Haar wavelets, that are a sequence of rescaled “square-shaped” functions which together form a wavelet family or basis. Voila and Jones adapted the thought of using Haar wavelets and developed the so-called Haar-like features. 

Haar-like features are digital image features utilized in object recognition. All human faces share some universal properties of the human face just like the eyes region is darker than its neighbour pixels, and the nose region is brighter than the attention region.

An easy technique to discover which region is lighter or darker is to sum up the pixel values of each regions and compare them. The sum of pixel values within the darker region might be smaller than the sum of pixels within the lighter region. If one side is lighter than the opposite, it could be an fringe of an eyebrow or sometimes the center portion could also be shinier than the encircling boxes, which may be interpreted as a nose This may be completed using Haar-like features and with the assistance of them, we are able to interpret the various parts of a face. 

There are 3 varieties of Haar-like features that Viola and Jones identified of their research:

  1. Edge features
  2. Line-features
  3. 4-sided features

Edge features and Line features are useful for detecting edges and features respectively. The four-sided features are used for locating diagonal features.

The worth of the feature is calculated as a single number: the sum of pixel values within the black area minus the sum of pixel values within the white area. The worth is zero for a plain surface during which all of the pixels have the identical value, and thus, provide no useful information. 

Since our faces are of complex shapes with darker and brighter spots, a Haar-like feature gives you a big number when the areas within the black and white rectangles are very different. Using this value, we get a bit of valid information out of the image.

To be useful, a Haar-like feature needs to present you a big number, meaning that the areas within the black and white rectangles are very different. There are known features that perform thoroughly to detect human faces:

For instance, after we apply this specific haar-like feature to the bridge of the nose, we get response. Similarly, we mix lots of these features to grasp if a picture region accommodates a human face.

What are Integral Images?

Within the previous section, we now have seen that to calculate a worth for every feature, we’d like to perform computations on all of the pixels inside that exact feature. In point of fact, these calculations may be very intensive because the variety of pixels can be much greater after we are coping with a big feature. 

The integral image plays its part in allowing us to perform these intensive calculations quickly so we are able to understand whether a feature of several features fit the factors.

An integral image (also often known as a summed-area table) is the name of each a knowledge structure and an algorithm used to acquire this data structure. It’s used as a fast and efficient technique to calculate the sum of pixel values in a picture or rectangular a part of a picture.

How is AdaBoost utilized in viola jones algorithm?

Next, we use a Machine Learning algorithm often known as AdaBoost. But why will we even want an algorithm?

The variety of features which can be present within the 24×24 detector window is almost 160,000, but only a couple of of those features are vital to discover a face. So we use the AdaBoost algorithm to discover the perfect features within the 160,000 features. 

Within the Viola-Jones algorithm, each Haar-like feature represents a weak learner. To come to a decision the sort and size of a feature that goes into the ultimate classifier, AdaBoost checks the performance of all classifiers that you simply supply to it.

To calculate the performance of a classifier, you evaluate it on all subregions of all the photographs used for training. Some subregions will produce a robust response within the classifier. Those might be classified as positives, meaning the classifier thinks it accommodates a human face. Subregions that don’t provide a robust response don’t contain a human face, within the classifiers opinion. They might be classified as negatives.

The classifiers that performed well are given higher importance or weight. The is a robust classifier, also called a boosted classifier, that accommodates the perfect performing weak classifiers.

So after we’re training the AdaBoost to discover vital features, we’re feeding it information in the shape of coaching data and subsequently training it to learn from the knowledge to predict. So ultimately, the algorithm is setting a minimum threshold to find out whether something may be classified as a useful feature or not.

What are Cascading Classifiers?

Perhaps the AdaBoost will finally select the perfect features around say 2500, nevertheless it remains to be a time-consuming process to calculate these features for every region. We now have a 24×24 window which we slide over the input image, and we’d like to search out if any of those regions contain the face. The job of the cascade is to quickly discard non-faces, and avoid squandering precious time and computations. Thus, achieving the speed essential for real-time face detection.

We arrange a cascaded system during which we divide the means of identifying a face into multiple stages. In the primary stage, we now have a classifier which is made up of our greatest features, in other words, in the primary stage, the subregion passes through the perfect features reminiscent of the feature which identifies the nose bridge or the one which identifies the eyes. In the following stages, we now have all of the remaining features.

When a picture subregion enters the cascade, it’s evaluated by the primary stage. If that stage evaluates the subregion as positive, meaning that it thinks it’s a face, the output of the stage is perhaps.

When a subregion gets a perhaps, it is distributed to the following stage of the cascade and the method continues as such till we reach the last stage.

If all classifiers approve the image, it’s finally classified as a human face and is presented to the user as a detection.

Now how does it help us to extend our speed? Principally, If the primary stage gives a negative evaluation, then the image is instantly discarded as not containing a human face. If it passes the primary stage but fails the second stage, it’s discarded as well. Principally, the image can get discarded at any stage of the classifier

Using a Viola-Jones Classifier to detect faces in a live webcam feed

On this section, we’re going to implement the Viola-Jones algorithm using OpenCV and detect faces in our webcam feed in real-time. We will even use the identical algorithm to detect the eyes of an individual too. This is kind of easy and all you wish is to put in OpenCV and Python in your PC. You possibly can discuss with this text to find out about OpenCV and find out how to install it

In OpenCV, we now have several trained Haar Cascade models that are saved as XML files. As a substitute of making and training the model from scratch, we use this file. We’re going to use “haarcascade_frontalface_alt2.xml” file on this project. Now allow us to start coding.

Step one is to search out the trail to the “haarcascade_frontalface_alt2.xml” and “haarcascade_eye_tree_eyeglasses.xml” files. We do that through the use of the os module of Python language.

import os
cascPathface = os.path.dirname(
    cv2.__file__) + "/data/haarcascade_frontalface_alt2.xml"
cascPatheyes = os.path.dirname(
    cv2.__file__) + "/data/haarcascade_eye_tree_eyeglasses.xml"

The following step is to load our classifier. We’re using two classifiers, one for detecting the face and others for detection eyes. The trail to the above XML file goes as an argument to CascadeClassifier() approach to OpenCV.

faceCascade = cv2.CascadeClassifier(cascPath)
eyeCascade = cv2.CascadeClassifier(cascPatheyes)

After loading the classifier, allow us to open the webcam using this easy OpenCV one-liner code

video_capture = cv2.VideoCapture(0)

Next, we’d like to get the frames from the webcam stream, we do that using the read() function. We use the infinite loop to get all of the frames until the time we would like to shut the stream.

while True:
    # Capture frame-by-frame
    ret, frame = video_capture.read()

The read() function returns:

  1. The actual video frame read (one frame on each loop)
  2. A return code

The return code tells us if we now have run out of frames, which is able to occur if we’re reading from a file. This doesn’t matter when reading from the webcam since we are able to record endlessly, so we’ll ignore it.

For this specific classifier to work, we’d like to convert the frame into greyscale.

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

The faceCascade object has a way detectMultiScale(), which receives a frame(image) as an argument and runs the classifier cascade over the image. The term MultiScale indicates that the algorithm looks at subregions of the image in multiple scales, to detect faces of various sizes.

faces = faceCascade.detectMultiScale(gray,
                                         scaleFactor=1.1,
                                         minNeighbors=5,
                                         minSize=(60, 60),
                                         flags=cv2.CASCADE_SCALE_IMAGE)

Allow us to undergo these arguments of this function:

  • scaleFactor – Parameter specifying how much the image size is reduced at each image scale. By rescaling the input image, you’ll be able to resize a bigger face to a smaller one, making it detectable by the algorithm. 1.05 is possible value for this, which suggests you employ a small step for resizing, i.e. reduce the dimensions by 5%, you increase the prospect of an identical size with the model for detection is found.
  • minNeighbors – Parameter specifying what number of neighbours each candidate rectangle should need to retain it. This parameter will affect the standard of the detected faces. Higher value ends in fewer detections but with higher quality. 3~6 is value for it.
  • flags –Mode of operation
  • minSize – Minimum possible object size. Objects smaller than which can be ignored.

The variable faces now contain all of the detections for the goal image. Detections are saved as pixel coordinates. Each detection is defined by its top-left corner coordinates and width and height of the rectangle that encompasses the detected face.

To indicate the detected face, we’ll draw a rectangle over it.OpenCV’s rectangle() draws rectangles over images, and it must know the pixel coordinates of the top-left and bottom-right corner. The coordinates indicate the row and column of pixels within the image. We are able to easily get these coordinates from the variable face.

Also as now, we all know the placement of the face, we define a brand new area which just accommodates the face of an individual and name it as faceROI.In faceROI we detect the eyes and encircle them using the circle function.

for (x,y,w,h) in faces:
        cv2.rectangle(frame, (x, y), (x + w, y + h),(0,255,0), 2)
        faceROI = frame[y:y+h,x:x+w]
        eyes = eyeCascade.detectMultiScale(faceROI)
        for (x2, y2, w2, h2) in eyes:
            eye_center = (x + x2 + w2 // 2, y + y2 + h2 // 2)
            radius = int(round((w2 + h2) * 0.25))
            frame = cv2.circle(frame, eye_center, radius, (255, 0, 0), 4)

The function rectangle() accepts the next arguments:

  • The unique image
  • The coordinates of the top-left point of the detection
  • The coordinates of the bottom-right point of the detection
  • The color of the rectangle (a tuple that defines the quantity of red, green, and blue (0-255)).In our case, we set as green just keeping the green component as 255 and rest as zero.
  • The thickness of the rectangle lines

Next, we just display the resulting frame and likewise set a technique to exit this infinite loop and shut the video feed. By pressing the ‘q’ key, we are able to exit the script here

cv2.imshow('Video', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

The following two lines are only to scrub up and release the image.

video_capture.release()
cv2.destroyAllWindows()

Listed below are the total code and output.

import cv2
import os
cascPathface = os.path.dirname(
    cv2.__file__) + "/data/haarcascade_frontalface_alt2.xml"
cascPatheyes = os.path.dirname(
    cv2.__file__) + "/data/haarcascade_eye_tree_eyeglasses.xml"

faceCascade = cv2.CascadeClassifier(cascPathface)
eyeCascade = cv2.CascadeClassifier(cascPatheyes)

video_capture = cv2.VideoCapture(0)
while True:
    # Capture frame-by-frame
    ret, frame = video_capture.read()
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = faceCascade.detectMultiScale(gray,
                                         scaleFactor=1.1,
                                         minNeighbors=5,
                                         minSize=(60, 60),
                                         flags=cv2.CASCADE_SCALE_IMAGE)
    for (x,y,w,h) in faces:
        cv2.rectangle(frame, (x, y), (x + w, y + h),(0,255,0), 2)
        faceROI = frame[y:y+h,x:x+w]
        eyes = eyeCascade.detectMultiScale(faceROI)
        for (x2, y2, w2, h2) in eyes:
            eye_center = (x + x2 + w2 // 2, y + y2 + h2 // 2)
            radius = int(round((w2 + h2) * 0.25))
            frame = cv2.circle(frame, eye_center, radius, (255, 0, 0), 4)

        # Display the resulting frame
    cv2.imshow('Video', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
video_capture.release()
cv2.destroyAllWindows()

Output:

This brings us to the top of this text where we learned in regards to the Viola Jones algorithm and its implementation in OpenCV.

viola jones algorithm

LEAVE A REPLY

Please enter your comment!
Please enter your name here