Patrick McGuire: Detecting machine-readable zones in passport images

Monday, November 30, 2015

Detecting machine-readable zones in passport images

Today’s blog post wouldn’t be possible without PyImageSearch Gurus member, Hans Boone. Hans is working on a computer vision project to automatically detect Machine-readable Zones (MRZs) in passport images — much like the region detected in the image above.

The MRZ region in passports or travel cards fall into two classes: Type 1 and Type 3. Type 1 MRZs are three lines, with each line containing 30 characters. The Type 3 MRZ only has two lines, but each line contains 44 characters. In either case, the MRZ encodes identifying information of a given citizen, including the type of passport, passport ID, issuing country, name, nationality, expiration date, etc.

Inside the PyImageSearch Gurus course, Hans showed me his progress on the project and I immediately became interested. I’ve always wanted to apply computer vision algorithms to passport images (mainly just for fun), but lacked the dataset to do so. Given the personal identifying information a passport contains, I obviously couldn’t write a blog post on the subject and share the images I used to develop the algorithm.

Luckily, Hans agreed to share some of the sample/specimen passport images he has access to — and I jumped at the chance to play with these images.

Now, before we get to far, it’s important to note that these passports are not “real” in the sense that they can be linked to an actual human being. But they are genuine passports that were generated using fake names, addresses, etc. for developers to work with.

You might think that in order to detect the MRZ region of a passport that you need a bit of machine learning, perhaps using the Linear SVM + HOG framework to construct an “MRZ detector” — but that would be overkill.

Instead, we can perform MRZ detection using only basic image processing techniques such as thresholding, morphological operations, and contour properties. In the remainder of this blog post, I’ll detail my own take on how to apply these methods to detect the MRZ region of a passport.

Looking for the source code to this post?
Jump right to the downloads section.

Detecting machine-readable zones in passport images

Let’s go ahead and get this project started. Open up a new file, name it

detect_mrz.py

, and insert the following code:

# import the necessary packages
from imutils import paths
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, help="path to images directory")
args = vars(ap.parse_args())

# initialize a rectangular and square structuring kernel
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
sqKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (21, 21))

Lines 2-6 import our necessary packages. I’ll assume you already have OpenCV installed. You’ll also need imutils, my collection of convenience functions to make basic image processing operations with OpenCV easier. You can install

imutils

using

pip

$ pip install imutils

From there, Lines 9-11 handle parsing our command line argument. We only need a single switch here,

--images

, which is the path to the directory containing the passport images we are going to process.

Finally, Lines 14 and 15 initialize two kernels which we’ll later use when applying morphological operations, specifically the closing operation. For the time being, simply note that the first kernel is rectangular with a width approximately 3x larger than the height. The second kernel is square. These kernels will allow us to close gaps between MRZ characters and openings between MRZ lines.

Now that our command line arguments are parsed, we can start looping over each of the images in our dataset and process them:

# import the necessary packages
from imutils import paths
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, help="path to images directory")
args = vars(ap.parse_args())

# initialize a rectangular and square structuring kernel
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
sqKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (21, 21))

# loop over the input image paths
for imagePath in paths.list_images(args["images"]):
        # load the image, resize it, and convert it to grayscale
        image = cv2.imread(imagePath)
        image = imutils.resize(image, height=600)
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # smooth the image using a 3x3 Gaussian, then apply the blackhat
        # morphological operator to find dark regions on a light background
        gray = cv2.GaussianBlur(gray, (3, 3), 0)
        blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)

Lines 20 and 21 loads our original image from disk and resizes it to have a maximum height of 600 pixels. You can see an example of an original image below:

Figure 1: Our original passport image that we are trying to detect the MRZ in.

Gaussian blurring is applied on Line 26 to reduce high frequency noise. We then apply a blackhat morphological operation to the blurred, grayscale image on Line 27.

A blackhat operator is used to reveal dark regions (i.e., MRZ text) against light backgrounds (i.e., the background of the passport itself). Since the passport text is always black on a light background (at least in terms of this dataset), a blackhat operation is appropriate. Below you can see the output of applying the blackhat operator:

Figure 2: Applying the blackhat morphological operator reveals the black MRZ text against the light passport background.

The next step in MRZ detection is to compute the gradient magnitude representation of the blackhat image using the Scharr operator:

# import the necessary packages
from imutils import paths
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, help="path to images directory")
args = vars(ap.parse_args())

# initialize a rectangular and square structuring kernel
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
sqKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (21, 21))

# loop over the input image paths
for imagePath in paths.list_images(args["images"]):
        # load the image, resize it, and convert it to grayscale
        image = cv2.imread(imagePath)
        image = imutils.resize(image, height=600)
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # smooth the image using a 3x3 Gaussian, then apply the blackhat
        # morphological operator to find dark regions on a light background
        gray = cv2.GaussianBlur(gray, (3, 3), 0)
        blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)

        # compute the Scharr gradient of the blackhat image and scale the
        # result into the range [0, 255]
        gradX = cv2.Sobel(blackhat, ddepth=cv2.CV_32F, dx=1, dy=0, ksize=-1)
        gradX = np.absolute(gradX)
        (minVal, maxVal) = (np.min(gradX), np.max(gradX))
        gradX = (255 * ((gradX - minVal) / (maxVal - minVal))).astype("uint8")

Here we compute the Scharr gradient along the x-axis of the blackhat image, revealing regions of the image that are not only dark against a light background, but also contain vertical changes in the gradient, such as the MRZ text region. We then take this gradient image and scale it back into the range [0, 255] using min/max scaling:

Figure 3: Applying Scharr operator to our blackhat image reveals regions that contain strong vertical changes in gradient.

While it isn’t entirely obvious why we apply this step, I will say that it’s extremely helpful in reducing false-positive MRZ detections. Without it, we can accidentally mark embellished or designed regions of the passport as the MRZ. I will leave this as an exercise to you to verify that computing the gradient of the blackhat image can improve MRZ detection accuracy.

The next step is to try to detect the actual lines of the MRZ:

# import the necessary packages
from imutils import paths
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, help="path to images directory")
args = vars(ap.parse_args())

# initialize a rectangular and square structuring kernel
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
sqKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (21, 21))

# loop over the input image paths
for imagePath in paths.list_images(args["images"]):
        # load the image, resize it, and convert it to grayscale
        image = cv2.imread(imagePath)
        image = imutils.resize(image, height=600)
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # smooth the image using a 3x3 Gaussian, then apply the blackhat
        # morphological operator to find dark regions on a light background
        gray = cv2.GaussianBlur(gray, (3, 3), 0)
        blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)

        # compute the Scharr gradient of the blackhat image and scale the
        # result into the range [0, 255]
        gradX = cv2.Sobel(blackhat, ddepth=cv2.CV_32F, dx=1, dy=0, ksize=-1)
        gradX = np.absolute(gradX)
        (minVal, maxVal) = (np.min(gradX), np.max(gradX))
        gradX = (255 * ((gradX - minVal) / (maxVal - minVal))).astype("uint8")

        # apply a closing operation using the rectangular kernel to close
        # gaps in between letters -- then apply Otsu's thresholding method
        gradX = cv2.morphologyEx(gradX, cv2.MORPH_CLOSE, rectKernel)
        thresh = cv2.threshold(gradX, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

First, we apply a closing operation using our rectangular kernel. This closing operation is meant to close gaps in between MRZ characters. We then apply thresholding using Otsu’s method to automatically threshold the image:

Figure 4: Applying a closing operation using a rectangular kernel (that is wider than it is tall) to close gaps in between the MRZ characters

As we can see from the figure above, each of the MRZ lines is present in our threshold map.

The next step is to close the gaps between the actual lines, giving us one large rectangular region that corresponds to the MRZ:

# import the necessary packages
from imutils import paths
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, help="path to images directory")
args = vars(ap.parse_args())

# initialize a rectangular and square structuring kernel
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
sqKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (21, 21))

# loop over the input image paths
for imagePath in paths.list_images(args["images"]):
        # load the image, resize it, and convert it to grayscale
        image = cv2.imread(imagePath)
        image = imutils.resize(image, height=600)
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # smooth the image using a 3x3 Gaussian, then apply the blackhat
        # morphological operator to find dark regions on a light background
        gray = cv2.GaussianBlur(gray, (3, 3), 0)
        blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)

        # compute the Scharr gradient of the blackhat image and scale the
        # result into the range [0, 255]
        gradX = cv2.Sobel(blackhat, ddepth=cv2.CV_32F, dx=1, dy=0, ksize=-1)
        gradX = np.absolute(gradX)
        (minVal, maxVal) = (np.min(gradX), np.max(gradX))
        gradX = (255 * ((gradX - minVal) / (maxVal - minVal))).astype("uint8")

        # apply a closing operation using the rectangular kernel to close
        # gaps in between letters -- then apply Otsu's thresholding method
        gradX = cv2.morphologyEx(gradX, cv2.MORPH_CLOSE, rectKernel)
        thresh = cv2.threshold(gradX, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

        # perform another closing operation, this time using the square
        # kernel to close gaps between lines of the MRZ, then perform a
        # series of erosions to break apart connected components
        thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, sqKernel)
        thresh = cv2.erode(thresh, None, iterations=4)

Here we perform another closing operation, this time using our square kernel. This kernel is used to close gaps between the individual lines of the MRZ, giving us one large region that corresponds to the MRZ. A series of erosions are then performed to break apart connected components that may have been joined during the closing operation. These erosions are also helpful in removing small blobs that are irrelevant to the MRZ.

Figure 5: A second closing operation is performed, this time using a square kernel to close the gaps in between individual MRZ lines.

For some passport scans, the border of the passport may have become attached to the MRZ region during the closing operations. To remedy this, we set 5% of the left and right borders of the image to zero (i.e., black):

# import the necessary packages
from imutils import paths
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, help="path to images directory")
args = vars(ap.parse_args())

# initialize a rectangular and square structuring kernel
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
sqKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (21, 21))

# loop over the input image paths
for imagePath in paths.list_images(args["images"]):
        # load the image, resize it, and convert it to grayscale
        image = cv2.imread(imagePath)
        image = imutils.resize(image, height=600)
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # smooth the image using a 3x3 Gaussian, then apply the blackhat
        # morphological operator to find dark regions on a light background
        gray = cv2.GaussianBlur(gray, (3, 3), 0)
        blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)

        # compute the Scharr gradient of the blackhat image and scale the
        # result into the range [0, 255]
        gradX = cv2.Sobel(blackhat, ddepth=cv2.CV_32F, dx=1, dy=0, ksize=-1)
        gradX = np.absolute(gradX)
        (minVal, maxVal) = (np.min(gradX), np.max(gradX))
        gradX = (255 * ((gradX - minVal) / (maxVal - minVal))).astype("uint8")

        # apply a closing operation using the rectangular kernel to close
        # gaps in between letters -- then apply Otsu's thresholding method
        gradX = cv2.morphologyEx(gradX, cv2.MORPH_CLOSE, rectKernel)
        thresh = cv2.threshold(gradX, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

        # perform another closing operation, this time using the square
        # kernel to close gaps between lines of the MRZ, then perform a
        # serieso of erosions to break apart connected components
        thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, sqKernel)
        thresh = cv2.erode(thresh, None, iterations=4)

        # during thresholding, it's possible that border pixels were
        # included in the thresholding, so let's set 5% of the left and
        # right borders to zero
        p = int(image.shape[1] * 0.05)
        thresh[:, 0:p] = 0
        thresh[:, image.shape[1] - p:] = 0

You can see the output of our border removal below.

Figure 6: Setting 5% of the left and right border pixels to zero, ensuring that the MRZ region is not attached to the scanned margin of the passport.

Compared to Figure 5 above, you can now see that the border has been removed.

The last step is to find the contours in our thresholded image and use contour properties to identify the MRZ:

# import the necessary packages
from imutils import paths
import numpy as np
import argparse
import imutils
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--images", required=True, help="path to images directory")
args = vars(ap.parse_args())

# initialize a rectangular and square structuring kernel
rectKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (13, 5))
sqKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (21, 21))

# loop over the input image paths
for imagePath in paths.list_images(args["images"]):
        # load the image, resize it, and convert it to grayscale
        image = cv2.imread(imagePath)
        image = imutils.resize(image, height=600)
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

        # smooth the image using a 3x3 Gaussian, then apply the blackhat
        # morphological operator to find dark regions on a light background
        gray = cv2.GaussianBlur(gray, (3, 3), 0)
        blackhat = cv2.morphologyEx(gray, cv2.MORPH_BLACKHAT, rectKernel)

        # compute the Scharr gradient of the blackhat image and scale the
        # result into the range [0, 255]
        gradX = cv2.Sobel(blackhat, ddepth=cv2.CV_32F, dx=1, dy=0, ksize=-1)
        gradX = np.absolute(gradX)
        (minVal, maxVal) = (np.min(gradX), np.max(gradX))
        gradX = (255 * ((gradX - minVal) / (maxVal - minVal))).astype("uint8")

        # apply a closing operation using the rectangular kernel to close
        # gaps in between letters -- then apply Otsu's thresholding method
        gradX = cv2.morphologyEx(gradX, cv2.MORPH_CLOSE, rectKernel)
        thresh = cv2.threshold(gradX, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

        # perform another closing operation, this time using the square
        # kernel to close gaps between lines of the MRZ, then perform a
        # serieso of erosions to break apart connected components
        thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, sqKernel)
        thresh = cv2.erode(thresh, None, iterations=4)

        # during thresholding, it's possible that border pixels were
        # included in the thresholding, so let's set 5% of the left and
        # right borders to zero
        p = int(image.shape[1] * 0.05)
        thresh[:, 0:p] = 0
        thresh[:, image.shape[1] - p:] = 0

        # find contours in the thresholded image and sort them by their
        # size
        cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
                cv2.CHAIN_APPROX_SIMPLE)[-2]
        cnts = sorted(cnts, key=cv2.contourArea, reverse=True)

        # loop over the contours
        for c in cnts:
                # compute the bounding box of the contour and use the contour to
                # compute the aspect ratio and coverage ratio of the bounding box
                # width to the width of the image
                (x, y, w, h) = cv2.boundingRect(c)
                ar = w / float(h)
                crWidth = w / float(gray.shape[1])

                # check to see if the aspect ratio and coverage width are within
                # acceptable criteria
                if ar > 5 and crWidth > 0.75:
                        # pad the bounding box since we applied erosions and now need
                        # to re-grow it
                        pX = int((x + w) * 0.03)
                        pY = int((y + h) * 0.03)
                        (x, y) = (x - pX, y - pY)
                        (w, h) = (w + (pX * 2), h + (pY * 2))

                        # extract the ROI from the image and draw a bounding box
                        # surrounding the MRZ
                        roi = image[y:y + h, x:x + w].copy()
                        cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
                        break

        # show the output images
        cv2.imshow("Image", image)
        cv2.imshow("ROI", roi)
        cv2.waitKey(0)

On Line 56 and 57 we compute the contours (i.e., outlines) of our thresholded image. We then take these contours and sort them based on their size in descending order (implying that the largest contours are first in the list).

On Line 61 we start looping over our sorted list of contours. For each of these contours, we’ll compute the bounding box (Line 65) and use it to compute two properties: the aspect ratio and the coverage ratio. The aspect ratio is simply the width of the bounding box divided by the height. The coverage ratio is the width of the bounding box divided by the width of the actual image.

Using these two properties we can make a check on Line 71 to see if we are examining the MRZ region. The MRZ is rectangular, with a width that is much larger than the height. The MRZ should also span at least 75% of the input image.

Provided these two cases hold, Lines 72-82 use the (x, y)-coordinates of the bounding box to extract the MRZ and draw the bounding box on our input image.

Finally, Lines 86-88 display our results.

Results

To see our MRZ detector in action, just execute the following command:

$ python detect_mrz.py --images examples

Below you can see of an example of a successful MRZ detection, with the MRZ outlined in green:

Figure 7: On the left, we have our input image. And on the right, we have the MRZ region that has been successfully detected.

Here is another example of detecting the Machine-readable Zone in a passport image using Python and OpenCV:

Figure 8: Applying MRZ detection to a scanned passport.

It doesn’t matter if the MRZ region is at the top or the bottom of the image. By applying morphological operations, extracting contours, and computing contour properties, we are able to extract the MRZ without a problem.

The same is true for the following image:

Figure 9: Detecting machine-readable zones in images using computer vision.

Let’s give another image a try:

Figure 10: Again, we are able to detect the MRZ in the passport scan using basic image processing techniques.

Up until now we have only seen Type 1 MRZs that contain three lines. However, our method works just as well with Type 3 MRZs that contain only two lines:

Figure 11: Detecting the MRZ in a Type 3 passport image using Python and OpenCV.

Here’s another example of detecting a Type 3 MRZ:

Figure 12: Applying computer vision and image processing to detect machine-readable zones in images.

Summary

In this blog post we learned how to detect Machine-readable Zones (MRZs) in passport scans using only basic image processing techniques, namely:

Thresholding.
Gradients.
Morphological operations (specifically, closings and erosions).
Contour properties.

These operations, while simple, allowed us to detect the MRZ regions in images without having to rely on more advanced feature extraction and machine learning methods such as Linear SVM + HOG for object detection.

Remember, when faced with a challenging computer vision problem — always consider the problem and your assumptions! As this blog post demonstrates, you might be surprised what basic image processing functions used in tandem can accomplish.

Once again, a big thanks to PyImageSearch Gurus member, Hans Boone, who supplied us with these example passport images! Thanks Hans!