Patrick McGuire: Pedestrian Detection OpenCV

Monday, November 9, 2015

Pedestrian Detection OpenCV

I’ve met a lot of amazing, uplifting people over the years. My PhD advisor who helped get me through graduate school. My father who was always there for me as a kid — and still is now. And my girlfriend who has always been positive, helpful, and supportive (even when I probably didn’t deserve it).

I’ve also met some demoralizing, discouraging ones. Family members who have gone out of their way to deter me from being an entrepreneur and working for myself. Colleagues who either disliked me or my work and chose to express their disdain in a public fashion. And then there are those who have said some pretty disheartening things over email, Twitter, and other internet outlets.

We’re all familiar with these types of people. Yet regardless of their demeanor (whether positive or negative), we’re all built from the same genetic material of four nucleobases: cytosine, guanine, adenine, and thymine.

These base pairs are combined in such a way that our bodies all have the same basic structure regardless of gender, race, or ethnicity. At the most structural level we all have a head, two arms, a torso, and two legs.

We can use computer vision to exploit this semi-rigid structure and extract features to quantify the human body. These features can be passed on to machine learning models that when trained can be used to detect and track humans in images and video streams. This is especially useful for the task of pedestrian detection, which is the topic we’ll be talking about in today’s blog post.

Read on to find out how you can use OpenCV and Python to perform pedestrian detection.

Looking for the source code to this post?
Jump right to the downloads section.

Pedestrian Detection OpenCV

Did you know that OpenCV has built-in methods to perform pedestrian detection?

OpenCV ships with a pre-trained HOG + Linear SVM model that can be used to perform pedestrian detection in both images and video streams. If you’re not familiar with the Histogram of Oriented Gradients and Linear SVM method, I suggest you read this blog post where I discuss the 6 step framework.

If you’re already familiar with the process (or if you just want to see some code on how pedestrian detection with OpenCV is done), just open up a new file, name it

detect.py

, and we’ll get coding:

# import the necessary packages
from __future__ import print_function
from imutils.object_detection import non_max_suppression
from imutils import paths
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, help="path to images directory")
args = vars(ap.parse_args())

# initialize the HOG descriptor/person detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

Lines 2-8 start by importing our necessary packages. We’ll import

print_function

to ensure our code is compatible with both Python 2.7 and Python 3 (this code will also work for OpenCV 2.4.X and OpenCV 3). From there, we’ll import the

non_max_suppression

function from my imutils package.

If you do not have

imutils

installed, let

pip

install it for you:

$ pip install imutils

If you do have

imutils

installed, you’ll need to upgrade to the latest version (v0.3.1) which includes the implementation of the

non_max_suppression

function, along with a few other minor updates:

$ pip install --upgrade imutils

I’ve talked about non-maxima suppression twice on the PyImageSearch blog, once in this introductory post, and again in this post on implementing a faster NMS algorithm. In either case, the gist of the non-maxima suppression algorithm is to take multiple, overlapping bounding boxes and reduce them to only a single bounding box:

Figure 3: (Left) Multiple bounding boxes are falsely detected for the person in the image. (Right) Apply non-maxima suppression allows us to suppress overlapping bounding boxes, leaving us with the correct final detection.

Figure 1: (Left) Multiple bounding boxes are falsely detected for the person in the image. (Right) Applying non-maxima suppression allows us to suppress overlapping bounding boxes, leaving us with the correct final detection.

This helps reduce the number of false-positives reported by the final object detector.

Lines 11-13 handle parsing our command line arguments. We only need a single switch here,

--images

, which is the path to the directory that contains the list of images we are going to perform pedestrian detection on.

Finally, Lines 16 and 17 initialize our pedestrian detector. First, we make a call to

hog = cv2.HOGDescriptor()

which initializes the Histogram of Oriented Gradients descriptor. Then, we call the

setSVMDetector

to set the Support Vector Machine to be pre-trained pedestrian detector, loaded via the

cv2.HOGDescriptor_getDefaultPeopleDetector()

function.

At this point our OpenCV pedestrian detector is fully loaded, we just need to apply it to some images:

# import the necessary packages
from __future__ import print_function
from imutils.object_detection import non_max_suppression
from imutils import paths
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, help="path to images directory")
args = vars(ap.parse_args())

# initialize the HOG descriptor/person detector
hog = cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

# loop over the image paths
for imagePath in paths.list_images(args["images"]):
        # load the image and resize it to (1) reduce detection time
        # and (2) improve detection accuracy
        image = cv2.imread(imagePath)
        image = imutils.resize(image, width=min(400, image.shape[1]))
        orig = image.copy()

        # detect people in the image
        (rects, weights) = hog.detectMultiScale(image, winStride=(4, 4),
                padding=(8, 8), scale=1.05)

        # draw the original bounding boxes
        for (x, y, w, h) in rects:
                cv2.rectangle(orig, (x, y), (x + w, y + h), (0, 0, 255), 2)

        # apply non-maxima suppression to the bounding boxes using a
        # fairly large overlap threshold to try to maintain overlapping
        # boxes that are still people
        rects = np.array([[x, y, x + w, y + h] for (x, y, w, h) in rects])
        pick = non_max_suppression(rects, probs=None, overlapThresh=0.65)

        # draw the final bounding boxes
        for (xA, yA, xB, yB) in pick:
                cv2.rectangle(image, (xA, yA), (xB, yB), (0, 255, 0), 2)

        # show some information on the number of bounding boxes
        filename = imagePath[imagePath.rfind("/") + 1:]
        print("[INFO] {}: {} original boxes, {} after suppression".format(
                filename, len(rects), len(pick)))

        # show the output images
        cv2.imshow("Before NMS", orig)
        cv2.imshow("After NMS", image)
        cv2.waitKey(0)

On Line 20 we start looping over the images in our

--images

directory. The examples in this blog post (and the additional images included in the source code download of this article) are samples form the popular INRIA Person Dataset (specifically, from the GRAZ-01 subset).

From there, Lines 23-25 handle loading our image off disk and resizing it to have a maximum width of 400 pixels. The reason we attempt to reduce our image dimensions is two-fold:

Reducing image size ensures that less sliding windows in the image pyramid need to be evaluated (i.e., have HOG features extracted from and then passed on to the Linear SVM), thus reducing detection time (and increasing overall detection throughput).
Resizing our image also improves the overall accuracy of our pedestrian detection (i.e., less false-positives).

Actually detecting pedestrians in images is handled by Lines 28 and 29 by making a call to the

detectMultiScale

method of the

hog

descriptor. The

detectMultiScale

method constructs an image pyramid with

scale=1.05

and a sliding window step size of

(4, 4)

pixels in both the x and y direction, respectively.

The size of the sliding window is fixed at 32 x 128 pixels, as suggested by the seminal Dalal and Triggs paper, Histograms of Oriented Gradients for Human Detection. The

detectMultiScale

function returns a 2-tuple of

rects

, or the bounding box (x, y)-coordinates of each person in the image, and

weights

, the confidence value returned by the SVM for each detection.

A larger

scale

size will evaluate less layers in the image pyramid which can make the algorithm faster to run. However, having too large of a scale (i.e., less layers in the image pyramid) can lead to pedestrians not being detected. Similarly, having too small of a

scale

size dramatically increases the number of image pyramid layers that need to be evaluated. Not only can this be computationally wasteful, it can also dramatically increase the number of false-positives detected by the pedestrian detector. That said, the

scale

is one of the most important parameters to tune when performing pedestrian detection. I’ll perform a more thorough review of each of the parameters to

detectMultiScale

in a future blog post.

Lines 32 and 33 take our initial bounding boxes and draw them on our image.

However, for some images you’ll notice that there are multiple, overlapping bounding boxes detected for each person (as demonstrated by Figure 1 above).

In this case, we have two options. We can detect if one bounding box is fully contained within another (as one of the OpenCV examples implements). Or we can apply non-maxima suppression and suppress bounding boxes that overlap with a significant threshold — and that’s exactly what Lines 38 and 39 do.

Note: If you’re interested in learning more about the HOG framework and non-maxima suppression, I would start by reading this introductory post on the 6-step framework. From there, check out this post on simple non-maxima suppression followed by an updated post that implements the optimized Malisiewicz method.

After applying non-maxima suppression, we draw the finalized bounding boxes on Lines 42 and 43, display some basic information about the image and number of bounding boxes on Lines 46-48, and finally display our output images to our screen on Lines 51-53.

Results of pedestrian detection in images

To see our pedestrian detection script in action, just issue the following command:

$ python detect.py --images images

Below I have provided a sample of results from the detection script:

Figure 1: The first result of our pedestrian detection script.

Figure 2: The first result of our pedestrian detection script.

Here we have detected a single person standing next to a police car.

Figure 2: Detecting a single person in the foreground and another person in the background.

Figure 3: Detecting a single person in the foreground and another person in the background.

In the above example we can see a man detected in the foreground of the image, while a woman pushing a baby stroller is detected in the background.

Figure 3: An example of why applying non-maxima suppression is important.

Figure 4: An example of why applying non-maxima suppression is important.

The above image serves an example of why applying non-maxima suppression is important. The

detectMultiScale

function falsely detected two bounding boxes (along with the correct bounding box), both overlapping the true person in the image. By applying non-maxima suppression we were able to suppress the extraneous bounding boxes, leaving us with the true detection

Figure 4: A second example demonstrating non-maxima suppression in action.

Figure 5: A second example demonstrating non-maxima suppression in action.

Again, we see that multiple false bounding boxes are detected, but by applying NMS we can remove them, leaving us with the true detection in the image.

Figure 5: Detecting pedestrians in a shopping mall.

Figure 6: Detecting pedestrians in a shopping mall.

Here we are detecting pedestrians in a shopping mall. Notice two people are walking away from the camera while another is walking towards the camera. In either case, our HOG method is able to detect the people. The larger

overlapThresh

in the

non_maxima_suppression

function ensures that the bounding boxes are not suppressed, even though they do partially overlap.

Figure 6: Detecting people in a blurred image.

Figure 7: Detecting people in a blurred image.

I was particularly surprised by the results of the above image. Normally the HOG descriptor does not perform well in the presence of motion blur, yet we are still able to detect the pedestrians in this image.

Figure 7: Detecting pedestrians outdoors, walking along the street.

Figure 8: Detecting pedestrians outdoors, walking along the street.

This is another example of multiple, overlapping bounding boxes, but due to the larger

overlapThresh

they are not suppressed, leaving us with the correct person detections.

Figure 8: Detecting four members of a family.

Figure 9: Detecting four members of a family.

The above image shows the versatility of our HOG + SVM pedestrian detector. We are not only able to detect the adult male, but also the three small children as well. (Note that the detector is not able to find the other child hiding behind his [presumed to be] father).

Figure 9: Detecting a depiction of pedestrians.

Figure 10: Detecting a depiction of pedestrians.

I include this image last simply because I find it amusing. We are clearly viewing a road sign, likely used to indicate a pedestrian crossing. However, our HOG + SVM detector marks the two people in this image as positive classifications!

Summary

In this blog post we learned how to perform pedestrian detection using the OpenCV library and the Python programming language.

The OpenCV library actually ships with a pre-trained HOG + Linear SVM detector based on the Dalal and Triggs method to automatically detect pedestrians in images.

While the HOG method tends to be more accurate than its Haar counter-part, it still requires that the parameters to

detectMultiScale

be set properly. In future blog posts, I’ll review each of the parameters to

detectMultiScale

, detail how to tune each of them, and describe the trade-offs between accuracy and performance.

Anyway, I hope you enjoyed this article! I’m planning on doing more object detection tutorials in the future, so if you want to be notified when these posts go live, please consider subscribing to the newsletter using the form below.

I also cover object detection using the HOG + Linear SVM method in detail inside the PyImageSearch Gurus course, so be sure to take a look!