Patrick McGuire: ImageNet classification with Python and Keras

Wednesday, August 10, 2016

ImageNet classification with Python and Keras

imagenet_example_header

Normally, I only publish blog posts on Monday, but I’m so excited about this one that it couldn’t wait and I decided to hit the publish button early.

You see, just a few days ago, François Chollet pushed three Keras models (VGG16, VGG19, and ResNet50) online — these networks are pre-trained on the ImageNet dataset, meaning that they can recognize 1,000 common object classes out-of-the-box.

To utilize these models in your own applications, all you need to do is:

Install Keras.
Clone the deep-learning-models repository.
Download the weights files for the pre-trained network(s) (which we’ll be done automatically for you when you import and instantiate the respective network architecture).
Apply the pre-trained ImageNet networks to your own images.

It’s really that simple.

So, why is this so exciting? I mean, we’ve had the weights to popular pre-trained ImageNet classification networks for awhile, right?

The problem is that these weight files are in Caffe format — and while the Caffe library may be the current standard for which many researchers use to construct new network architectures, train them, and evaluate them, Caffe also isn’t the most Python-friendly library in the world, at least in terms of constructing the network architecture itself.

Note: You can do some pretty cool stuff with the Caffe-Python bindings, but I’m mainly focusing on how Caffe architectures and the training process itself is defined via

.prototxt

configuration files rather than code that logic can be inserted into.

There is also the fact that there isn’t an easy or streamlined method to convert Caffe weights to a Keras-compatible model.

That’s all starting to change now — we can now easily apply VGG16, VGG19, and ResNet50 using Keras and Python to our own applications without having to worry about the Caffe => Keras weight conversion process.

In fact, it’s now as simple as these three lines of code to classify an image using a Convolutional Neural Network pre-trained on the ImageNet dataset with Python and Keras:

model = VGG16(weights="imagenet")
preds = model.predict(preprocess_input(image))
print(decode_predictions(preds))

Of course, there are a few other imports and helper functions that need to be utilized — but I think you get the point:

It’s now dead simple to apply ImageNet-level pre-trained networks using Python and Keras.

To find out how, keep reading.

Looking for the source code to this post?
Jump right to the downloads section.

ImageNet classification with Python and Keras

In the remainder of this tutorial, I’ll explain what the ImageNet dataset is, and then provide Python and Keras code to classify images into 1,000 different categories using state-of-the-art network architectures.

What is ImageNet?

Within computer vision and deep learning communities, you might run into a bit of contextual confusion surrounding what ImageNet is and what it isn’t.

You see, ImageNet is actually a project aimed at labeling and categorizing images into almost 22,000 categories based on a defined set of words and phrases. At the time of this writing, there are over 14 million images in the ImageNet project.

So, how is ImageNet organized?

To order such a massive amount of data, ImageNet actually follows the WordNet hierarchy. Each meaningful word/phrase inside WordNet is called a “synonym set” or “synset” for short. Within the ImageNet project, images are organized according to these synsets, with the goal being to have 1,000+ images per synset.

ImageNet Large Scale Recognition Challenge (ILSVRC)

In the context of computer vision and deep learning, whenever you hear people talking about ImageNet, they are very likely referring to the ImageNet Large Scale Recognition Challenge, or simply ILSVRC for short.

The goal of the image classification track in this challenge is to train a model that can classify an image into 1,000 separate categories using over 100,000 test images — the training dataset itself consists of approximately 1.2 million images.

Be sure to keep the context of ImageNet in mind when you’re reading the remainder of this blog post or other tutorials and papers related to ImageNet. While in the context of image classification, object detection, and scene understanding, we often refer to ImageNet as the classification challenge and the dataset associated with the challenge, remember that there is also a more broad project called ImageNet where these images are collected, annotated, and organized.

Configuring your system for Keras and ImageNet

To configure your system to use the state-of-the-art VGG16, VGG19, and ResNet50 networks, make sure you follow my previous tutorial on installing Keras.

The Keras library will use PIL/Pillow for some helper functions (such as loading an image from disk). You can install Pillow, the more Python friendly fork of PIL, by using this command:

$ pip install pillow

To run the networks pre-trained on the ImageNet dataset with Python, you’ll need to make sure you have the latest version of Keras installed. At the time of this writing, the latest version of Keras is

1.0.6

, the minimum requirement for utilizing the pre-trained models.

You can check your version of Keras by executing the following commands:

$ python
>>> import keras
Using Theano backend.
Using gpu device 1: GeForce GTX TITAN X (CNMeM is disabled, cuDNN 4007)
>>> keras.__version__
'1.0.6'
>>>

Alternatively, you can use

pip freeze

to list the out the packages installed in your environment:

Figure 1: Listing the set of Python packages installed in your environment.

If you are using an earlier version of Keras prior to

1.0.6

, uninstall it, and then use my previous tutorial to install the latest version.

Next, to gain access to VGG16, VGG19, and the ResNet50 architectures and pre-trained weights, you need to clone the deep-learning-models repository from GitHub:

$ git clone http://ift.tt/2ajePvK

From there, change into the

deep-learning-models

directory and

ls

the contents:

$ cd deep-learning-models
$ ls -l
total 40
-rw-rw-r-- 1 adrian adrian  1233 Aug  6 11:20 imagenet_utils.py
-rw-rw-r-- 1 adrian adrian  1074 Aug  6 11:20 LICENSE
-rw-rw-r-- 1 adrian adrian  2569 Aug  6 11:20 README.md
-rw-rw-r-- 1 adrian adrian 10258 Aug  6 11:20 resnet50.py
-rw-rw-r-- 1 adrian adrian  7225 Aug  6 11:20 vgg16.py
-rw-rw-r-- 1 adrian adrian  7508 Aug  6 11:20 vgg19.py

Notice how we have four Python files. The

resnet50.py

vgg16.py

, and

vgg19.py

files correspond to their respective network architecture definitions.

The

imagenet_utils

file, as the name suggests, contains a couple helper functions that allow us to prepare images for classification as well as obtain the final class label predictions from the network.

Keras and Python code for ImageNet CNNs

We are now ready to write some Python code to classify image contents utilizing Convolutional Neural Networks (CNNs) pre-trained on the ImageNet dataset.

To start, open up a new file, name it

test_imagenet.py

, and insert the following code:

# import the necessary packages
from keras.preprocessing import image as image_utils
from imagenet_utils import decode_predictions
from imagenet_utils import preprocess_input
from vgg16 import VGG16
import numpy as np
import argparse
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
        help="path to the input image")
args = vars(ap.parse_args())

# load the original image via OpenCV so we can draw on it and display
# it to our screen later
orig = cv2.imread(args["image"])

We start on Lines 2-8 by importing our required Python packages. Line 2 imports the

image

pre-processing module directly from the Keras library. However, Lines 3-5 import functions and network architectures from within the

deep-learning-models

directory. Because of this, you’ll want to make sure your

test_imagenet.py

file is inside the

deep-learning-models

directory (or your

PYTHONPATH

is updated accordingly), otherwise your script will fail to import these functions.

Alternatively, you can use the “Downloads” section at the bottom of this tutorial to download the source code + example images. This download ensures the code is configured correctly and that your directory structure is setup properly.

Lines 11-14 parse our command line arguments. We only need a single switch here,

--image

, which is the path to our input image.

We then load our image in OpenCV format on Line 18. This step isn’t strictly required since Keras provides helper functions to load images (which I’ll demonstrate in the next code block), but there are differences in how both these functions work, so if you intend on applying any type of OpenCV functions to your images, I suggest loading your image via

cv2.imread

and then again via the Keras helpers. Once you get a bit more experience manipulating NumPy arrays and swapping channels, you can avoid the extra I/O overhead, but for the time being, let’s keep things simple.

# import the necessary packages
from keras.preprocessing import image as image_utils
from imagenet_utils import decode_predictions
from imagenet_utils import preprocess_input
from vgg16 import VGG16
import numpy as np
import argparse
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
        help="path to the input image")
args = vars(ap.parse_args())

# load the original image via OpenCV so we can draw on it and display
# it to our screen later
orig = cv2.imread(args["image"])

# load the input image using the Keras helper utility while ensuring
# that the image is resized to 224x224 pxiels, the required input
# dimensions for the network -- then convert the PIL image to a
# NumPy array
print("[INFO] loading and preprocessing image...")
image = image_utils.load_img(args["image"], target_size=(224, 224))
image = image_utils.img_to_array(image)

Line 25 applies the

.load_img

Keras helper function to load our image from disk. We supply a

target_size

of 224 x 224 pixels, the required spatial input image dimensions for the VGG16, VGG19, and ResNet50 network architectures.

After calling

.load_img

, our

image

is actually in PIL/Pillow format, so we need to apply the

.img_to_array

function to convert the

image

to a NumPy format.

Next, let’s preprocess our image:

# import the necessary packages
from keras.preprocessing import image as image_utils
from imagenet_utils import decode_predictions
from imagenet_utils import preprocess_input
from vgg16 import VGG16
import numpy as np
import argparse
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
        help="path to the input image")
args = vars(ap.parse_args())

# load the original image via OpenCV so we can draw on it and display
# it to our screen later
orig = cv2.imread(args["image"])

# load the input image using the Keras helper utility while ensuring
# that the image is resized to 224x224 pxiels, the required input
# dimensions for the network -- then convert the PIL image to a
# NumPy array
print("[INFO] loading and preprocessing image...")
image = image_utils.load_img(args["image"], target_size=(224, 224))
image = image_utils.img_to_array(image)

# our image is now represented by a NumPy array of shape (3, 224, 224),
# but we need to expand the dimensions to be (1, 3, 224, 224) so we can
# pass it through the network -- we'll also preprocess the image by
# subtracting the mean RGB pixel intensity from the ImageNet dataset
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)

If at this stage we inspect the

.shape

of our

image

, you’ll notice the shape of the NumPy array is (3, 224, 224) — each image is 224 pixels wide, 224 pixels tall, and has 3 channels (one for each of the Red, Green, and Blue channels, respectively).

However, before we can pass our

image

through our CNN for classification, we need to expand the dimensions to be (1, 3, 224, 224).

Why do we do this?

When classifying images using Deep Learning and Convolutional Neural Networks, we often send images through the network in “batches” for efficiency. Thus, it’s actually quite rare to pass only one image at a time through the network — unless of course, you only have one image to classify (like we do).

We then preprocess the

image

on Line 33 by subtracting the mean RGB pixel intensity computed from the ImageNet dataset.

Finally, we can load our Keras network and classify the image:

# import the necessary packages
from keras.preprocessing import image as image_utils
from imagenet_utils import decode_predictions
from imagenet_utils import preprocess_input
from vgg16 import VGG16
import numpy as np
import argparse
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
        help="path to the input image")
args = vars(ap.parse_args())

# load the original image via OpenCV so we can draw on it and display
# it to our screen later
orig = cv2.imread(args["image"])

# load the input image using the Keras helper utility while ensuring
# that the image is resized to 224x224 pxiels, the required input
# dimensions for the network -- then convert the PIL image to a
# NumPy array
print("[INFO] loading and preprocessing image...")
image = image_utils.load_img(args["image"], target_size=(224, 224))
image = image_utils.img_to_array(image)

# our image is now represented by a NumPy array of shape (3, 224, 224),
# but we need to expand the dimensions to be (1, 3, 224, 224) so we can
# pass it through the network -- we'll also preprocess the image by
# subtracting the mean RGB pixel intensity from the ImageNet dataset
image = np.expand_dims(image, axis=0)
image = preprocess_input(image)

# load the VGG16 network
print("[INFO] loading network...")
model = VGG16(weights="imagenet")

# classify the image
print("[INFO] classifying image...")
preds = model.predict(image)
(inID, label) = decode_predictions(preds)[0]

# display the predictions to our screen
print("ImageNet ID: {}, Label: {}".format(inID, label))
cv2.putText(orig, "Label: {}".format(label), (10, 30),
        cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
cv2.imshow("Classification", orig)
cv2.waitKey(0)

On Line 37 we initialize our

VGG16

class. We could also substitute in

VGG19

ResNet50

here, but for the sake of this tutorial, we’ll use

VGG16

Supplying

weights="imagenet"

indicates that we want to use the pre-trained ImageNet weights for the respective model.

Once the network has been loaded and initialized, we can predict class labels by making a call to the

.predict

method of the

model

. These predictions are actually a NumPy array with 1,000 entries — the predicted probabilities associated with each class in the ImageNet dataset.

Calling

decode_predictions

on these predictions gives us the ImageNet Unique ID of the label, along with a human-readable text version of the label.

Finally, Lines 45-49 print the predicted

label

to our terminal and display the output image to our screen.

ImageNet + Keras image classification results

To apply the Keras models pre-trained on the ImageNet dataset to your own images, make sure you use the “Downloads” form at the bottom of this blog post to download the source code and example images. This will ensure your code is properly formatted (without errors) and your directory structure is correct.

But before we can apply our pre-trained Keras models to our own images, let’s first discuss how the model weights are (automatically) downloaded.

Downloading the model weights

The first time you execute the

test_imagenet.py

script, Keras will automatically download and cache the architecture weights to your disk in the

~/.keras/models

directory.

Subsequent runs of

test_imagenet.py

will be substantially faster (since the network weights will already be downloaded) — but that first run will be quite slow (comparatively), due to the download process.

That said, keep in mind that these weights are fairly large HDF5 files and might take awhile to download if you do not have a fast internet connection. For convenience, I have listed out the size of the weights files for each respective network architecture:

ResNet50: 102MB
VGG16: 553MB
VGG19: 574MB

ImageNet and Keras results

We are now ready to classify images using the pre-trained Keras models! To test out the models, I downloaded a couple images from Wikipedia (“brown bear” and “space shuttle”) — the rest are from my personal library.

To start, execute the following command:

$ python test_imagenet.py --image images/dog_beagle.png

Notice that since this is my first run of

test_imagenet.py

, the weights associated with the VGG16 ImageNet model need to be downloaded:

Figure 2: Downloading the pre-trained ImageNet weights for VGG16.

Once our weights are downloaded, the VGG16 network is initialized, the ImageNet weights loaded, and the final classification is obtained:

Figure 3: Utilizing the VGG16 network trained on ImageNet to recognize a beagle in an image.

Figure 3: Utilizing the VGG16 network trained on ImageNet to recognize a beagle (dog) in an image.

Let’s give another image a try, this one of a beer glass:

$ python test_imagenet.py --image images/beer.png

Figure 4: Recognizing a beer glass using a Convolutional Neural Network trained on ImageNet.

The following image is of a brown bear:

$ python test_imagenet.py --image images/brown_bear.png

Figure 5: Utilizing VGG16, Keras, and Python to recognize the brown bear in an image.

I took the following photo of my keyboard to test out the ImageNet network using Python and Keras:

$ python test_imagenet.py --image images/keyboard.png

Figure 6: Utilizing Python, Keras, and a Convolutional Neural Network trained on ImageNet to recognize image contents.

I then took a photo of my monitor as I was writing the code for this blog post. Interestingly, the network classified this image as “desktop computer”, which makes sense given that the monitor is the primary subject of the image:

$ python test_imagenet.py --image images/monitor.png

Figure 7: Image classification via Python, Keras, and CNNs.

This next image is of a space shuttle:

$ python test_imagenet.py --image images/space_shuttle.png

Figure 8: Recognizing image contents using a Convolutional Neural Network trained on ImageNet via Keras + Python.

The final image is of a steamed crab, a blue crab, to be specific:

$ python test_imagenet.py --image images/steamed_crab.png

Figure 9: Convolutional Neural Networks and ImageNet for image classification with Python and Keras.

What I find interesting about this particular example is that VGG16 classified this image as “Dungeness crab”, which may be technically incorrect. However, keep in mind that blue crabs are called blue crabs for a reason — their outer shell is blue. It is not until you steam them for eating do their shells turn red. The Dungeness crab on the other hand has a slightly dark orange tint to it, even before steaming. The fact that the network was even able to label this image as “crab” is very impressive.

A note on model timing

From start to finish (not including the downloading of the network weights files), classifying an image using VGG16 took approximately 11 seconds on my Titan X GPU. This includes the process of actually loading both the image and network from disk, performing any initializations, passing the image through the network, and obtaining the final predictions.

However, once the network is actually loaded into memory, classification takes only 1.8 seconds, which goes to show you how much overhead is involved in actually loading an initializing a large Convolutional Neural Network. Furthermore, since images can be presented to the network in batches, this same time for classification will hold for multiple images.

If you’re classifying images on your CPU, then you should obtain a similar classification time. This is mainly because there is substantial overhead in copying the image from memory over to the GPU. When you pass multiple images via batches, it makes the I/O overhead for using the GPU more acceptable.

Summary

In this blog post, I demonstrated how to use the newly released deep-learning-models repository to classify image contents using state-of-the-art Convolutional Neural Networks trained on the ImageNet dataset.

To accomplish this, we leveraged the Keras library, which is maintained by François Chollet — be sure to reach out to him and say thanks for maintaining such an incredible library. Without Keras, deep learning with Python wouldn’t be half as easy (or as fun).

Of course, you might be wondering how to train your own Convolutional Neural Network from scratch using ImageNet. Don’t worry, we’re getting there — we just need to understand the basics of neural networks, machine learning, and deep learning first. Walk before you run, so to speak.

I’ll be back next week with a tutorial on hyperparameter tuning, a key step to maximizing your model’s accuracy.

To be notified when future blog posts are published on the PyImageSearch blog, be sure to enter your email address in the form below — se you next week!

Downloads:

The post ImageNet classification with Python and Keras appeared first on PyImageSearch.

from PyImageSearch http://ift.tt/2bf1Usw
via IFTTT

Patrick McGuire

Latest YouTube Video

Wednesday, August 10, 2016

ImageNet classification with Python and Keras

ImageNet classification with Python and Keras

What is ImageNet?

ImageNet Large Scale Recognition Challenge (ILSVRC)

Configuring your system for Keras and ImageNet

Keras and Python code for ImageNet CNNs

ImageNet + Keras image classification results

Downloading the model weights

ImageNet and Keras results

A note on model timing

Summary

Downloads:

No comments:

Click to Show Support

Click to Show Support