I just finished reading Maria Yao‘s article “Chihuahua OR Muffin? Searching For The Best Computer Vision API“. It’s a fun read, but I felt it didn’t really show off the power of Watson Visual Recognition.
For the demo in the article, the general classifier was being used.
One of the main advantages of Watson Visual recognition is that you can create your own custom classifiers. It is very simple too.
First, you need data.
Like most data, it tends to need a bit of cleaning. So I deleted any images below 14KB in size. The reason for this was the majority at that size were just corrupted. I also went through and deleted any images which were adverts or “this image is no longer there” banners.
Overall that was 500 images deleted totally. It still left 3,000 images to play with.
Next I created a Visual Recognition service. For this I created the free version. This limited me to 250 events a day. So I had to lower my training sets to 100 pictures from each set.
I took a random 100 from each. I didn’t examine the photos at all, but here are a few to give you an idea of how the images look.
As you can see, no thought put into worrying about other items in the picture.
I zipped the images up, and created a classifier like so.
Then I clicked create, and waited a little over 10 minutes for it to analyse the pictures.
Once the classifier was finished training, I was ready to test. As some may not be aware, Visual Recognition also offers a food classifier. So for my first two tests I tried my classifier, General and Food.
So you can see the red bar on the classifier I made. This is more because I only gave 100 examples. As you give more training examples, it’s confidence increases. But you can see that the difference between Muffin and Chihuahua is clear.
You can also see the food classifier got it as well.
What about the Chihuahua?
As you can see, all three do quite well on classification. But what about the original pictures which look similar? I ran those through and ended up with this.
As you can see it got them all right! None of these were used to train against.
As demos go though this is simple and fun. But with good classified images, it can be scary accurate for proper real world use cases.
Having said that, I did have one failure. Testing the samples on Maria’s page, it was able to understand the cookie monster muffin and the man holding the chihuahua.
But the muffin in the plastic bag with the Chihuahua it could not get. I tried cropping out the dog, but it still failed but with a lower confidence. I suspect this is a combination of training and a bad quality photo.