Training a Computer to Classify Distinct Images Using a Convolutional Neural Network

Cyra Aggarwal
5 min readMay 22, 2021

I taught my computer to differentiate between a dog and a cat using a Convolutional Neural Network, and the result was not what I was expecting. So, let’s backtrack a bit.

https://www.analyticsvidhya.com/blog/2020/02/mathematics-behind-convolutional-neural-network/

I am sure we are all aware that Artificial Intelligence has been prevalent in our daily lives these past few years. But one of the most common ways Machine Learning is applied is through Convolutional Neural Networks (Image Classification).

https://deepnote.com/@donatien-subts/Untitled-Python-Project-IcLtjwWZQSi3aDNHWBtKdA

So, now for a bit of background on what exactly is CNN. There are many aspects in a CNN they have:

  • Convolutional Layers (and when you add this, you have to choose how many filters and the size)
  • Respective filers for each layer
  • Filters include: Corners, Edges, Angles, Images, and Shapes
  • Different types of images so the computer can differentiate

Each filter focuses on one feature of the picture, so it can sort it in the end.

Example of different filters:

https://debuggercafe.com/visualizing-filters-and-feature-maps-in-convolutional-neural-networks-using-pytorch/

In a CNN there are also different layers and each layer ultimately plays a vital part in the product. There is the (as I said before) convolutional layer, the pooling layer, and then finally the ‘fully connected’ layer.

https://www.researchgate.net/figure/Schematic-diagram-of-a-basic-convolutional-neural-network-CNN-architecture-26_fig1_336805909

The Convolution Layer: This layer (in small terms) basically extracts features from an input image. And, saves the relationship between pixels by learning image features using small squares of input data. It is a mathematical task that takes two inputs: an image matrix and a filter or kernel. Convolution of an image with different filters can do things such as edge detection, blur and sharpen by applying filters.

The Pooling Layer: The pooling layers section would reduce the number of parameters when the images are too large. Spatial pooling also called subsampling or downsampling which lessens the dimensionality of each map but maintains important information. Spatial Pooling can be of three important different types:

  • Max Pooling
  • Average Pooling
  • Sum Pooling

The Fully Connected Layer: This layer is where all the inputs from one layer are connected to every activation unit of the next layer.

For more information about different layers check out: https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148

Some of our Iphones use Facial Recognition, a type of Image classification to sort different people’s images into their individual folders.
https://support-blog.journiapp.com/en-us/article/207-how-to-make-a-journi-photo-book-with-face-recognition

Moving on, CNN’s (Convolutional Neural Networks) are very widespread across our everyday devices. An example of Image Classification is actually surprising.

Some of our Iphones use Facial Recognition, a type of Image classification to sort different people’s images into their individual folders.

The computer would have to be fully trained to properly sort the pictures, but it isn’t uncommon to see your sister’s picture in the folder where your wife’s photos are. And that is because in that specific picture your sister and wife must look really similar. This confuses the computer tremendously since the two women look the same according to the code.

Now that we have all our basics written down, let’s move on to the main part, teaching the computer to recognize a dog and a cat. So, first of all, I used a website called Teachable Model, then I renamed the 2 classes ‘Dog’ and ‘Cat’ after I found a few pictures of both the animals and added them to their respective class.

Dog
Cat

Then, I pressed ‘Train Model’ after it was trained I would choose a picture of a dog or cat, then wait to see the results. And I kid you not, for the first few times it had done it wrong.

In this picture, it says the photo is 67% dog and only 33% cat. But, in reality, it is supposed to be 100% cat.
In this picture, it says the photo is 66% dog and only 34% cat. But, in reality, it is supposed to be 100% dog.

So, I had to run it a few more times... and it worked! Both animals got 100%

Right result!
Correct!

Now, as for the code… here is a video of me explaining it: https://youtu.be/iNy7EsIzrK0

import tensorflow.keras
from PIL import Image, ImageOps
import numpy as np
# Disable scientific notation for clarity
np.set_printoptions(suppress=True)
# Load the model
model = tensorflow.keras.models.load_model(‘keras_model.h5’)
# Create the array of the right shape to feed into the keras model
# The ‘length’ or number of images you can put into the array is
# determined by the first position in the shape tuple, in this case 1.
data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32)
# Replace this with the path to your image
image = Image.open(‘test_photo.jpg’)
#resize the image to a 224x224 with the same strategy as in TM2:
#resizing the image to be at least 224x224 and then cropping from the center
size = (224, 224)
image = ImageOps.fit(image, size, Image.ANTIALIAS)
#turn the image into a numpy array
image_array = np.asarray(image)
# display the resized image
image.show()
# Normalize the image
normalized_image_array = (image_array.astype(np.float32) / 127.0) — 1
# Load the image into the array
data[0] = normalized_image_array
# run the inference
prediction = model.predict(data)
print(prediction)

Computers will understand sarcasm before americans do. -Geoffrey Hinton

Looking at all of the possibilities that a CNN can do, it’s not surprising that computers will also detect sarcasm before us. This just shows how advanced Artificial Intelligence has become. And hopefully in the future will continue to change and develop for the best and not the worst.

TL;DR: So as you see, making a CNN is just using different shapes, sizes, lines, and angles to determine which picture is related to which.

_________________________________________________________________

Here is a video elaborating further on what Convolutional Neural Network is: https://youtu.be/YRhxdVk_sIs

And comment below what is your favourite animal a 🐶 or a 🐱?

--

--