Image classifier for Oolong tea and Green tea

Developing the Dataset

In this project, I will be making an image classifier. My previous attempts a while ago I remember did not work. To change it up a bit, I will be using the Pytorch framework. Rather than TensorFlow. As this will be my first time using Pytorch. I will be taking a tutorial before I begin my project. The project is a classifier that spots the difference between bottled oolong tea and bottled green tea.

The tutorial I used was PyTorch's 60 min blitz. (It did take me more than 60 mins to complete though). After typing out the tutorial I got used to using Pytorch. So I started moving on the project. As this will be an image classifier. I needed to get a whole lot of images into my dataset. First stubbed upon a medium article. Which used a good scraper. But even after a few edits, it did not work.

image001.png

So I moved to using Bing for image search. Bing has an image API you can use. Which makes it easier to collect images compared to google. I used this article from pyimagesearch. I had a few issues with the API in the beginning. As the endpoints that Microsoft gave me did not work for the tutorial. After looking around and a few edits I was able to get it working.

image003.png

But looking at the image folder gave me this:

image005.png

After looking through the code I noticed that the program did not produce new images. But changed images to β€œ000000”. This was from not copying the final section of code from the blog post. Which updated a counter variable.

image007.png

Now I got the tutorial code to work we can try my search terms. To create my dataset. First I started with green tea. So I used the term "bottle green tea". Which the program gave me these images:

image009.png

Afterwards, I got oolong tea, by using the term β€œbottle oolong tea”.

image011.png

Now I had personally go through the dataset myself. And delete any images that were not relevant to the class. The images I deleted looked like this:

image013.png

This is because we want the classifier to work on bottled drinks. So leaves are not relevant. Regardless of how tasty they are.

They were a few blank images. Needless to say, there are not useful for the image classifier.

image015.png
image017.png

Even though this image has a few green tea bottles. It also has an oolong tea bottle so this will confuse the model. So it's better to simplify it to having only a few green tea bottles. Rather than a whole variety which is not part of a class.

After I did that with both datasets. I was ready to move on to creating the model. So went to Google Collab and imported Pytorch.

As the dataset has less than 200 images. I thought it will be a good idea to apply data augmentation. I first found this tutorial which used Pytorch transformations.

When applying the transformation, it fell into a few issues. One it did not plot correctly, nor did it recognize my images. But I was able to fix it

image019.png

The issues stemmed from not slicing the dataset correctly. As ImageFolder(Pytorch helper function) returns a tuple not just a list of images.

Developing the model

After that, I started working on developing the model. I used the CNN used in the 60-minute blitz tutorial. One of the first errors I dealt with was data not going through the network properly.

shape '[-1, 400]' is invalid for input of size 179776

 

I was able to fix this issue by changing the kernel sizes to 2 x 2. And changed the feature maps to 64.

self.fc1 = nn.Linear(64 * 2 * 2, 120) 
x = x.view(-1, 64 * 2 * 2)

Straight afterwards I fell into another error:

ValueError: Expected input batch_size (3025) to match target batch_size (4).

 

This was fixed by reshaping the x variable again.

x = x.view(-1, 64 * 55 * 55) 

By using this forum post.

Then another error 😩.

RuntimeError: size mismatch, m1: [4 x 193600], m2: [256 x 120] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:41

 

This was fixed by changing the linear layer again.

self.fc1 = nn.Linear(64 * 55 * 55, 120)
 

Damn, I did not know one dense layer can give me so many headaches.

 

After training. I needed to test the model. I did not make the test folder before making the model. (rookie mistake). I made it quickly afterwards by using the first 5 images of each class. This is a bad thing to do. This can contaminate the data. And lead to overfitting. But I needed to see if the model was working at the time.

I wanted to plot one of the images in a test folder. So I borrowed the code from the tutorial. This led to an error. But fixed it by changing the range to one. Instead of 5. This was because my model only has 2 labels. (tensor[0] and tensor[1]) Not 4.

When loaded the model. It threw me an error. But this was fixed by resizing the images in the test folder. After a few runs of the model, I noticed that it did not print the loss. So edited the code to do so.

if i % 10 == 0:  
            print('[%d, %d] loss: %.5f' %
                  (epoch + 1, i + 1, running_loss / 10))
            running_loss = 0.0
image021.png

As we can see the loss is very high.

When I tested the model on the test folder it gave me this:

image023.png

Which means it’s at best guessing. I later found it was because it picked every image as green tea. With 5 images with a green tea label. This lead it to be right 50% of the time.

So this leads me to the world of model debugging. Trying to reduce the loss rate and improve accuracy.  

Debugging the model

I started to get some progress of debugging my model when I found this medium article

The first point the writer said was to start with a simple problem that is known to work with your type of data. Even though I thought I was using a simple model designed to work with image data. As I was borrowing the model from the Pytorch tutorial. But it did not work. So opted for a simpler model shape. Which I found from a TensorFlow tutorial. Which only had 3 convolutional layers. And two dense layers. I had to change the final layer parameters as they were giving me errors. As it was designed for 10 targets in mind. Instead of 2. Afterwards, I fiddled around with the hyperparameters. With that, I was able to get the accuracy of the test images to 80% πŸ˜€.

Accuracy of the network on the 10 test images: 80 %
10
8 
image025.png

Testing the new model

As the test data set was contaminated because I used the images from the training dataset. I wanted to restructure the test data sets with new images. To make sure the accuracy was correct.

To restructure it I did it in the following style:

https://stackoverflow.com/a/60333941

https://stackoverflow.com/a/60333941

While calling the test and train dataset separately.

train_dataset = ImageFolder(root='data/train')
test_dataset  = ImageFolder(root='data/test')

With the test images, I decided to use Google instead of Bing. As it gives different results. After that,  I tested the model on the new test dataset.

Accuracy of the network on the 10 test images: 70 %
10
7

As it was not a significant decrease in the model learnt something about green tea and oolong tea.

Using the code from the Pytorch tutorial I wanted to analyse it even further:

Accuracy of Green_tea_test : 80 %
Accuracy of oolong_tea_test : 60 %

Plotting the predictions

While I like this. I want the program to tell me which images it got wrong. So, I went to work trying to do so. To do this, I stitched up the image data with the labels, in an independent list.

for i, t, p, in zip(img_list, truth_label, predicted_label):
  one_merge_dict = {'image': i, 'truth_label': t, 'predicted_label': p}
  merge_list.append(one_merge_dict)

print(merge_list)

On my first try I got this:

image029.png


As we can see its very cluttered and shows all the images. To clear it out I removed unneeded text.

image031.png

Now I can start separating the images from right to wrong.

I was able to do this by using a small if statement

Now the program correctly plots the images with the incorrect label. But the placement of the images is wrong. This is because it still uses the placement of the other correct images. But the If statement does not plot them.


I corrected it by changing the loop:

image033.png

I wanted to get rid of the whitespace, so I decided to change the plotting of images.

  

ax = plt.subplot(1, 4, i + 1)

fig = plt.figure(figsize=(15,15))

image035.png

Now I have an idea, what the model got wrong. The first sample the green tea does not have the traditional green design. So it’s understandable that is got it wrong. The second sample. Was oolong tea but misclassified it as green tea. My guess is the bottle as has a very light colour tone. Compared to the golden or orange tone oolong bottles in the training data. Then the third example, where the bottle has the traditional oolong design with an orange colour palette. But the model misclassified it with green tea. I guess that the leaf on the bottle affected the judgement of the model. Leading it to classify it as green tea.

Now I have finished the project. This is not to say that I may not come back to this project. As an addition to the implementation side could be made. Like having a mobile app that can detect oolong or green tea. With your phone's camera. Or a simple web app, that users can upload their bottled tea images. And the model can classify your image on the website.