Classifying Image Dataset created using Google Images
In this blog post, I’m explaining the approach and step-by-step execution of how I created an image dataset by downloading images from Google search and classifying them using the feed-forward neural network using PyTorch.
This project has various sub-components explained in detail below. The complete source code is available in the link is given below.
Creating a dataset
I wanted to build my own custom dataset for this project from scratch so, I didn't resort to any famous datasets like MNIST or CIFAR10. To download images in bulk from google image search, I used the tutorial from pyimagesearch.com website create a deep learning dataset using Google Images
This is a 4-step process,
- Search for required images one-by-one; I chose to download cats and elephants.
- Using the javascript code given in the tutorial, extract all the image URLs to a text file.
- Run the given python code to download images using the URLs from the file.
- Manually validate images for any corrupted files or irrelevant images.
The tutorial provides the code snippets and it’s easy to use. In the codebase downloaded, I found a folder with the images of Santa. I just added those to my dataset. So the objective here is: Classifying cats, elephants and santas!
Unlike MNIST or any standard datasets, the downloaded images are not perfect. These need to be verified, labeled, and organized to make a dataset out of this. The folder structure is a very important step and it might lead to confusion later. Below is the one I used. In the following code snippet, I explored the structure using code.
Sample view of raw images downloaded from google. You can notice images of different sizes, orientations, and some not so clear.
It is a prerequisite to transform, resize, crop, normalize the images for the model to consume it without difficulties. Also, the images are converted into tensors and pushed into a data loader.
Let’s take a look at the refined images and the code following.
Let us split the dataset into training and validation sets. And then load it to using data loader for batch processing. I chose a batch size of 64.
Now the dataset is ready and let’s get started with modeling.
Model class & Training on GPU
It’s a best practice to create the base class and helper classes for us to make the code reusable later for any different datasets and even to make it easier for changes. The base class includes functions for training and validation. Also, there is an ‘evaluate’ function to validate the model and one ‘fit’ function.
The loss function used is cross-entropy. It proves to be effective with image classification models.
CPU Vs GPU
This model can be trained on CPU as the volume of images is low. However, using a GPU will bring in better power and will reduce the time taken. I trained my model on my laptop which does not have a GPU but this code has the provision to switch devices basis availability. I even tried running this code on Kaggle and it picked up GPU and that reduced the time taken plus it doesn't slow down your system ;)
Model Training
In this project, I used the feed-forward neural network for image classification. Even a convolutional neural network can be used.
I used 4 hidden layers in this feed-forwad nn as the input size was large. the input size is 12,288 which is the image size multiplied by the colour scale = 3*64*64. the output size is 3 which means 3 different image categories to be classified — Cats, elephants, and Santas.
The model base class is shown below:
The activation function used here is the Rectified Linear Unit aka ReLU.
Fit the model
With the initial evaluation, the losses look really high and accuracy is just around 23% which is expected.
It’s time to train the model by loading the data batch by batch using the data loaders.
With the first round of 20 epochs and the learning rate of 1e-1 the accuracy went up to 69% however the model reached the accuracy of 79% at the 10th epoch itself. So trained it for another 5 epochs for a learning rate of 1e-3. Now the accuracy peaked at 79%.
The losses were fluctuating heavily and died down as the epochs increase.
Final check
After the model is trained and with an accuracy of 79% it gave out stunning results for the test dataset.
Conclusion
The challenging part of this project was to get the dataset right and refine it well enough for the model to work perfectly. The feedforward network was able to achieve an accuracy of 79% which is excellent.
With a further modification of NN architecture, the accuracy can be improved significantly. The project with a Conv NN can make the model perform at its peak.
Refer to the complete source code here: https://jovian.ml/arunkumar-try/05-courseproject-arunkumar-try
Also, you can make use of the dataset which I uploaded onto Kaggle: https://www.kaggle.com/arunkumartry/catselephantssantas
Thanks for reading. Any feedback is appreciated.