Today’s blog post is the long-awaited tutorial on real-time drowsiness detection on the Raspberry Pi!
Back in May I wrote a (laptop-based) drowsiness detector that can be used to detect if the driver of a motor vehicle was getting tired and potentially falling asleep at the wheel.
The driver drowsiness detector project was inspired by a conversation I had with my Uncle John, a long haul truck driver who has witnessed a more than a few accidents due to fatigued drivers.
The post was really popular and a lot of readers got value out of it…
…but the method was not optimized for the Raspberry Pi!
Since then readers have been requesting me to write a followup blog post that covers the necessary optimizations to run the drowsiness detector on the Raspberry Pi.
I caught up with my Uncle John a few weeks ago and asked him what he would think of a small computer that could be mounted inside his truck cab to help determine if he was getting tired at the wheel.
He wasn’t crazy about the idea of being monitored by a camera his entire work day (and I don’t necessarily blame him either — I wouldn’t want to be monitored all the time either). But he did eventually concede that a device like this, and ideally less invasive, would certainly help avoid accidents due to fatigued drivers.
To learn more about these facial landmark optimizations and how to run our drowsiness detector on the Raspberry Pi, just keep reading!
Looking for the source code to this post?
Jump right to the downloads section.
Raspberry Pi: Facial landmarks + drowsiness detection with OpenCV and dlib
Today’s tutorial is broken into four parts:
- Discussing the tradeoffs between Haar cascades and HOG + Linear SVM detectors.
- Examining the TrafficHAT used to create the alarm that will sound if a driver/user gets tired.
- Implementing dlib facial landmark optimizations so we can deploy our drowsiness detector to the Raspberry Pi.
- Viewing the results of our optimized driver drowsiness detection algorithm on the Raspberry Pi.
Before we get started I would highly encourage you to read through my previous tutorial on Drowsiness detection with OpenCV.
While I’ll be reviewing the code in its entirety here, you should still read the previous post as I discuss the actual Eye Aspect Ratio (EAR) algorithm in more detail.
The EAR algorithm is responsible for detecting driver drowsiness.
Haar cascades: less accurate, but faster than HOG
The major optimization we need to run our driver drowsiness detection algorithm on the Raspberry Pi is to swap out the default dlib HOG + Linear SVM face detector and replace it with OpenCV’s Haar cascade face detector.
While HOG + Linear SVM detectors tend to be significantly more accurate than Haar cascades, the cascade method is also much faster than HOG + Linear SVM detection algorithms.
A complete review of both HOG + Linear SVM and Haar cascades work is outside the scope of this blog post, but I would encourage you to:
- Read this post on Histogram of Oriented Gradients and Object Detection where I discuss the pros and cons of HOG + Linear SVM and Haar cascades.
- Work through the PyImageSearch Gurus course where I demonstrate how to implement your own custom HOG + Linear SVM object detectors from scratch.
The Raspberry Pi TrafficHAT
In our previous tutorial on drowsiness detection I used my laptop to execute driver drowsiness detection code — this enabled me to:
- Ensure the drowsiness detection algorithm would run in real-time due to the faster hardware.
- Use the laptop speaker to sound an alarm by playing a .WAV file.
The Raspberry Pi does not have a speaker so we cannot play any loud alarms to wake up the driver…
…but the Raspberry Pi is a highly versatile piece of hardware that includes a large array of hardware add-ons.
One of my favorites is the TrafficHAT:
The TrafficHAT includes:
- Three LED lights
- A button
- A loud buzzer (which we’ll be using as our alarm)
This kit is an excellent starting point to get some exposure to GPIO. If you’re just getting started as well, be sure to take a look at the TrafficHat.
You don’t have to use the TrafficHAT of course; any other piece of hardware that emits a loud noise will do.
Another approach I like to do is just plug a 3.5mm audio cable in the audio jack, and then set up text to speech using
espeak(a package available via
apt-get). Using this method you could have your Pi say “WAKEUP WAKEUP!” when you’re drowsy. I’ll leave this as an exercise for you to implement if you so choose.
However, for the sake of this tutorial I will be using the TrafficHAT. You can buy your own TrafficHAT here.
And from there you can install the required Python packages you need to use the TrafficHAT via
pip. But first, ensure you’re in your appropriate virtual environment on your Pi. I have a thorough explanation on virtual environments on this previous post.
Here are the installation steps upon opening a terminal or SSH connection:
$ workon cv $ pip install RPi.GPIO $ pip install gpiozero
From there, if you want to check that everything is installed properly in your virtual environment you may run the Python interpreter directly:
$ workon cv $ python >>> import RPi.GPIO >>> import gpiozero >>> import numpy >>> import dlib >>> import cv2 >>> import imutils
Note: I’ve made the assumption that the virtual environment you are using already has the above packages installed in it. My
cvvirtual environment has NumPy, dlib, OpenCV, and imutils already installed, so by using
pipto install the
RPi.GPIOand
gpiozeroto install the respective GPIO packages, I’m able to access all six libraries from within the same environment. You may
pip installeach of the packages (except for OpenCV). To install an optimized OpenCV on your Raspberry Pi, then just follow this previous post. If you are having trouble getting dlib installed, please follow this guide.
The driver drowsiness detection algorithm is identical to the one we implemented in our previous tutorial.
To start, we will apply OpenCV’s Haar cascades to detect the face in an image, which boils down to finding the bounding box (x, y)-coordinates of the face in the frame.
Given the bounding box the face we can apply dlib’s facial landmark predictor to obtain 68 salient points used to localize the eyes, eyebrows, nose, mouth, and jawline:
As I discuss in this tutorial, dlib’s 68 facial landmarks are indexable which enables us to extract the various facial structures using simple Python array slices.
Given the facial landmarks associated with an eye, we can apply the Eye Aspect Ratio (EAR) algorithm which was introduced by Soukupová and Čech’s in their 2017 paper, Real-Time Eye Blink Detection suing Facial Landmarks:
On the top-left we have an eye that is fully open and the eye facial landmarks plotted. Then on the top-right we have an eye that is closed. The bottom then plots the eye aspect ratio over time. As we can see, the eye aspect ratio is constant (indicating that the eye is open), then rapidly drops to close to zero, then increases again, indicating a blink has taken place.
You can read more about the blink detection algorithm and the eye aspect ratio in this post dedicated to blink detection.
In our drowsiness detector case, we’ll be monitoring the eye aspect ratio to see if the value falls but does not increase again, thus implying that the driver/user has closed their eyes.
Once implemented, our algorithm will start by localizing the facial landmarks on extracting the eye regions:
We can then monitor the eye aspect ratio to determine if the eyes are closed:
And then finally raising an alarm if the eye aspect ratio is below a pre-defined threshold for a sufficiently long amount of time (indicating that the driver/user is tired):
In the next section, we’ll implement the optimized drowsiness detection algorithm detailed above on the Raspberry Pi using OpenCV, dlib, and Python.
A real-time drowsiness detector on the Raspberry Pi with OpenCV and dlib
Open up a new file in your favorite editor or IDE and name it
pi_drowsiness_detection.py. From there, let’s get started coding:
# import the necessary packages from imutils.video import VideoStream from imutils import face_utils import numpy as np import argparse import imutils import time import dlib import cv2
Lines 1-9 handle our imports — make sure you have each of these installed in your virtual environment.
From there let’s define a distance function:
def euclidean_dist(ptA, ptB): # compute and return the euclidean distance between the two # points return np.linalg.norm(ptA - ptB)
On Lines 11-14 we define a convenience function for calculating the Euclidean distance using NumPy. Euclidean is arguably the most well known and must used distance metric. The Euclidean distance is normally described as the distance between two points “as the crow flies”.
Now let’s define our Eye Aspect Ratio (EAR) function which is used to compute the ratio of distances between the vertical eye landmarks and the distances between the horizontal eye landmarks:
def eye_aspect_ratio(eye): # compute the euclidean distances between the two sets of # vertical eye landmarks (x, y)-coordinates A = euclidean_dist(eye[1], eye[5]) B = euclidean_dist(eye[2], eye[4]) # compute the euclidean distance between the horizontal # eye landmark (x, y)-coordinates C = euclidean_dist(eye[0], eye[3]) # compute the eye aspect ratio ear = (A + B) / (2.0 * C) # return the eye aspect ratio return ear
The return value will be approximately constant when the eye is open and will decrease towards zero during a blink. If the eye is closed, the eye aspect ratio will remain constant at a much smaller value.
From there, we need to parse our command line arguments:
# construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-c", "--cascade", required=True, help = "path to where the face cascade resides") ap.add_argument("-p", "--shape-predictor", required=True, help="path to facial landmark predictor") ap.add_argument("-a", "--alarm", type=int, default=0, help="boolean used to indicate if TrafficHat should be used") args = vars(ap.parse_args())
We have defined two required arguments and one optional one on Lines 33-40:
-
--cascade
: The path to the Haar cascade XML file used for face detection. -
--shape-predictor
: The path to the dlib facial landmark predictor file. -
--alarm
: A boolean to indicate if the TrafficHat buzzer should be used when drowsiness is detected.
Both the
--cascadeand
--shape-predictorfiles are available in the “Downloads” section at the end of the post.
If the
--alarmflag is set, we’ll set up the TrafficHat:
# check to see if we are using GPIO/TrafficHat as an alarm if args["alarm"] > 0: from gpiozero import TrafficHat th = TrafficHat() print("[INFO] using TrafficHat alarm...")
As shown in Lines 43-46 if the argument supplied is greater than 0, we’ll import the TrafficHat function to handle our buzzer alarm.
Let’s also define a set of important configuration variables:
# define two constants, one for the eye aspect ratio to indicate # blink and then a second constant for the number of consecutive # frames the eye must be below the threshold for to set off the # alarm EYE_AR_THRESH = 0.3 EYE_AR_CONSEC_FRAMES = 16 # initialize the frame counter as well as a boolean used to # indicate if the alarm is going off COUNTER = 0 ALARM_ON = False
The two constants on Lines 52 and 53 define the EAR threshold and number of consecutive frames eyes must be closed to be considered drowsy, respectively.
Then we initialize the frame counter and a boolean for the alarm (Lines 57 and 58).
From there we’ll load our Haar cascade and facial landmark predictor files:
# load OpenCV's Haar cascade for face detection (which is faster than # dlib's built-in HOG detector, but less accurate), then create the # facial landmark predictor print("[INFO] loading facial landmark predictor...") detector = cv2.CascadeClassifier(args["cascade"]) predictor = dlib.shape_predictor(args["shape_predictor"])
Line 64 differs from the face detector initialization from our previous post on drowsiness detection — here we use a faster detection algorithm (Haar cascades) while sacrificing accuracy. Haar cascades are faster than dlib’s face detector (which is HOG + Linear SVM-based) making it a great choice for the Raspberry Pi.
There are no changes to Line 65 where we load up dlib’s
shape_predictorwhile providing the path to the file.
Next, we’ll initialize the indexes of the facial landmarks for each eye:
# grab the indexes of the facial landmarks for the left and # right eye, respectively (lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"] (rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]
Here we supply array slice indexes in order to extract the eye regions from the set of facial landmarks.
We’re now ready to start our video stream thread:
# start the video stream thread print("[INFO] starting video stream thread...") vs = VideoStream(src=0).start() # vs = VideoStream(usePicamera=True).start() time.sleep(1.0)
If you are using the PiCamera module, be sure to comment out Line 74 and uncomment Line 75 to switch the video stream to the Raspberry Pi camera. Otherwise if you are using a USB camera, you can leave this unchanged.
We have one second sleep so the camera sensor can warm up.
From there let’s loop over the frames from the video stream:
# loop over frames from the video stream while True: # grab the frame from the threaded video file stream, resize # it, and convert it to grayscale # channels) frame = vs.read() frame = imutils.resize(frame, width=450) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # detect faces in the grayscale frame rects = detector.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)
The beginning of this loop should look familiar if you’ve read the previous post. We read a frame, resize it (for efficiency), and convert it to grayscale (Lines 83-85).
Then we detect faces in the grayscale image with our detector on Lines 88-90.
Now let’s loop over the detections:
# loop over the face detections for (x, y, w, h) in rects: # construct a dlib rectangle object from the Haar cascade # bounding box rect = dlib.rectangle(int(x), int(y), int(x + w), int(y + h)) # determine the facial landmarks for the face region, then # convert the facial landmark (x, y)-coordinates to a NumPy # array shape = predictor(gray, rect) shape = face_utils.shape_to_np(shape)
Line 93 begins a lengthy for-loop which is broken down into several code blocks here. First we extract the coordinates and width + height of the
rectsdetections. Then, on Lines 96 and 97 we construct a dlib
rectangleobject using the information extracted from the Haar cascade bounding box.
From there, we determine the facial landmarks for the face region (Line 102) and convert the facial landmark (x, y)-coordinates to a NumPy array.
Given our NumPy array,
shape, we can extract each eye’s coordinates and compute the EAR:
# extract the left and right eye coordinates, then use the # coordinates to compute the eye aspect ratio for both eyes leftEye = shape[lStart:lEnd] rightEye = shape[rStart:rEnd] leftEAR = eye_aspect_ratio(leftEye) rightEAR = eye_aspect_ratio(rightEye) # average the eye aspect ratio together for both eyes ear = (leftEAR + rightEAR) / 2.0
Utilizing the indexes of the eye landmarks, we can slice the
shapearray to obtain the (x, y)-coordinates each eye (Lines 107 and 108).
We then calculate the EAR for each eye on Lines 109 and 110.
Soukupová and Čech recommend averaging both eye aspect ratios together to obtain a better estimation (Line 113).
This next block is strictly for visualization purposes:
# compute the convex hull for the left and right eye, then # visualize each of the eyes leftEyeHull = cv2.convexHull(leftEye) rightEyeHull = cv2.convexHull(rightEye) cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1) cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)
We can visualize each of the eye regions on our frame by using
cv2.drawContoursand supplying the
cv2.convexHullcalculation of each eye (Lines 117-120). These few lines are great for debugging our script but aren’t necessary if you are making an embedded product with no screen.
From there, we will check our Eye Aspect Ratio (
ear) and frame counter (
COUNTER) to see if the eyes are closed, while sounding the alarm to alert the drowsy driver if needed:
# check to see if the eye aspect ratio is below the blink # threshold, and if so, increment the blink frame counter if ear < EYE_AR_THRESH: COUNTER += 1 # if the eyes were closed for a sufficient number of # frames, then sound the alarm if COUNTER >= EYE_AR_CONSEC_FRAMES: # if the alarm is not on, turn it on if not ALARM_ON: ALARM_ON = True # check to see if the TrafficHat buzzer should # be sounded if args["alarm"] > 0: th.buzzer.blink(0.1, 0.1, 10, background=True) # draw an alarm on the frame cv2.putText(frame, "DROWSINESS ALERT!", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2) # otherwise, the eye aspect ratio is not below the blink # threshold, so reset the counter and alarm else: COUNTER = 0 ALARM_ON = False
On Line 124 we check the
earagainst the
EYE_AR_THRESH— if it is less than the threshold (eyes are closed), we increment our
COUNTER(Line 125) and subsequently check it to see if the eyes have been closed for enough consecutive frames to sound the alarm (Line 129).
If the alarm isn’t on, we turn it on for a few seconds to wake up the drowsy driver. This is accomplished on Lines 136-138.
Optionally (if you’re implementing this code with a screen), you can draw the alarm on the frame as I have done on Lines 141 and 142.
That brings us to the case where the
earwasn’t less than the
EYE_AR_THRESH— in this case we reset our
COUNTERto 0 and make sure our alarm is turned off (Lines 146-148).
We’re almost done — in our last code block we’ll draw the EAR on the
frame, display the
frame, and do some cleanup:
# draw the computed eye aspect ratio on the frame to help # with debugging and setting the correct eye aspect ratio # thresholds and frame counters cv2.putText(frame, "EAR: {:.3f}".format(ear), (300, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2) # show the frame cv2.imshow("Frame", frame) key = cv2.waitKey(1) & 0xFF # if the `q` key was pressed, break from the loop if key == ord("q"): break # do a bit of cleanup cv2.destroyAllWindows() vs.stop()
If you’re integrating with a screen or debugging you may wish to display the computed eye aspect ratio on the frame as I have done on Lines 153 and 154. The frame is displayed to the actual screen on Lines 157 and 158.
The program is stopped when the ‘q’ key is pressed on a keyboard (Lines 157 and 158).
You might be thinking, “I won’t have a keyboard hooked up in my car!” Well, if you’re debugging using your webcam and your computer at your desk, you certainly do. If you want to use the button on the TrafficHAT to turn on/off the drowsiness detection algorithm, that is perfectly fine — the first reader to post the solution in the comments to using the button to turn on and off the drowsiness detector with the Pi deserves an ice cold craft beer or a hot artisan coffee.
Finally, we clean up by closing any open windows and stopping the video stream (Lines 165 and 166).
Drowsiness detection results
To run this program on your own Raspberry Pi, be sure to use the “Downloads” section at the bottom of this post to grab the source code, face detection Haar cascade, and dlib facial landmark detector.
I didn’t have enough time to wire everything up in my car and record the screen while as I did previously. It would have been quite challenging to record the Raspberry Pi screen while driving as well.
Instead, I’ll demonstrate at my desk — you can then take this implementation and use it inside your own car for drowsiness detection as you see fit.
You can see an image of my setup below:
To run the program, simply execute the following command:
$ python pi_detect_drowsiness.py --cascade haarcascade_frontalface_default.xml \ --shape-predictor shape_predictor_68_face_landmarks.dat --alarm 1
I have included a video of myself demoing the real-time drowsiness detector on the Raspberry Pi below:
Our Raspberry Pi 3 is able to accurately determine if I’m getting “drowsy”. We were able to accomplish this using our optimized code.
Disclaimer: I do not advise that you rely upon the hobbyist Raspberry Pi and this code to keep you awake at the wheel if you are in fact drowsy while driving. The best thing to do is to pull over and rest; walk around; or have a coffee/soda. Have fun with this project and show it off to your friends, but do not risk your life or that of others.
How do I run this program automatically when the Pi boots up?
This is a common question I receive. I have a blog post covering the answer here: Running a Python + OpenCV script on reboot.
Summary
In today’s blog post, we learned how to optimize facial landmarks on the Raspberry Pi by swapping out a HOG + Linear SVM-based face detector for a Haar cascade.
Haar cascades, while less accurate, are significantly faster than HOG + Linear SVM detectors.
Given the detections from the Haar cascade we were able to construct a
dlib.rectangleobject corresponding to the bounding box (x, y)-coordinates in the image. This object was fed into dlib’s facial landmark predictor which in turn gives us the set of localized facial landmarks on the face. From there, we applied the same algorithm we used in our previous post to detect drowsiness in a video stream.
I hope you enjoyed this tutorial!
To be notified when new blog posts are published here on the PyImageSearch blog, be sure to enter your email address in the form below — I’ll be sure to notify you when new content is released!
Downloads:
The post Raspberry Pi: Facial landmarks + drowsiness detection with OpenCV and dlib appeared first on PyImageSearch.
from PyImageSearch http://ift.tt/2zvsiZs
via IFTTT
No comments:
Post a Comment