Author Archives: sodoherty

Meet the Context Object

MrContext
Hi I’m the Context object, and today we are going to learn some tips and tricks about me.

Don’t forget me!

For all you people starting with Watson Assistant (WA), you might not know me. In fact nearly everyone forgets about me for the first application they create.

WA is a stateless system. That means without me it cannot understand where the user left off in the conversation. So it’s very important to send the updated version of me back to WA on your next call.

If your conversation is repeating the same thing over and over, then this is probably why.

I belong to one workspaceonelove

It’s possible to create multi-workspace applications. But a common mistake is thinking that I can be used by multiple workspaces. When I move to a new workspace, I will generate a new conversation ID, and my system map will be meaningless to the new workspace (even if a copy).

Anatomy.MrContextXRay.png

It’s important to understand each of my internal pieces and what they do. I’ll talk a little bit about the input and output parts later.

Context.conversation_id

This is a unique identifier assigned to mark all messages belonging to the same conversation. If you don’t supply this, or you supply an invalid one then a new one is generated for you. It has no other use except for logging.

Context.system

This is what WA uses to understand where it is in the current conversation. I won’t go into details, because it’s undocumented. One tip though! If any variable starts with an underscore (example: “_node_output_map”), this means it’s internal structure is fixed. So any hacks you do against this variable will likely fail on any updates.

Context.branch_exited & Context.branch_exited_reason

This will tell you if WA had to leave the branch in order to find a response for the user. It will also explain why it had to exit.

Context.privateprivate0811

This is a special context variable. Anything you store in this object you can hide from the conversation logging.

Very handy for storing passwords. Just be aware that if you use any of the values elsewhere they can be seen in the logs.

Everything else.

All other objects are created by you! I use a SPeL engine which analyses the variable and change it as I need it. So be careful on what you do. For example:

  • 10 + "10" = 2

You can embed code blocks into the variable values, but regardless of what it returns, it must be wrapped in a string.

  • Will work:  "id_number": ""
  • Won't work:  "id_number": 

Code blocks will only work in WA. They are treated as text if you send them from the application.

Proper feeding.sick0811.png

It’s important to not overfeed me. While I can hold a lot of information, it is not good to make me the session storage for your application. There are a few reasons for this.

  1. Causes more network traffic.
  2. Increases logging sizes.
  3. Doesn’t scale very well

To ensure you keep me nice and fit you can do the following.

One time context variable outputs.

If I need to pass a context variable to the application but I don’t want to see it again, then you should put it into the Output object of the response.

Object grouping

If you need to send related variables, group them within an object. For example:

Without Object grouping With Object grouping
"context": {
    "name": "Bob",
    "id": 12345,
    "order_id": 67890
}


"context": {
    "user": {
        "name": "Bob",
        "id": 12345
    },
    "order_id": 67890
}

This allows you to drop a large number of context variables with ease when they are no longer needed.

Variable requests

If you have a huge number of potential context variables, then you can use the request model to pull in just the variables you need from the application.

Request (from WA) Response (to WA)
"context": {
  "request": "name,id,order_id"
}


"context": {
    "name": "Bob",
    "id": 12345
    "order_id": 67890
}

Call out to cloud functions.cloud0811.png

Abstracting your data calls away from your application layer allows you to slot in and out updates without changing your orchestration layer. You are under a time limit to get the data, but if you can stay within 5 seconds, then this can be a better way to retrieve data and act on it.

Remembering the good times.photo0811.png

As a general rule, if you need to jump around the dialog tree, you should use “Jump to”, “Skip” and “Digressions”. There are approaches to get me to do it for you though.

Use these patterns with caution though, as you are moving conversation logic to the application layer. This can cause tight cohesion and more prone for bugs appearing later on.

Snapshots

This is easy enough. You just take a backup of the system object. Then overwrite the existing system object when you want to revert.

Forced jumps

This is a little tricker. You need to first traverse the tree to each jump location, then store the system objects. You can these use these system objects to jump to those areas. I would say use digressions instead if at all possible. This requires having to remap every time there is a change to the workspace.

… and there you have it. Look after me, I’ll make sure everything runs fine!

Making a Statement.

A lot of chat bots focus on answering the users question. Which is great, but it still can make it feel a little bit clinical. You can mitigate this by defining the personality, tone and positioning of the system.

But there are nuances of conversation that can make it feel more human when talking to it.

Take this example:

Screen Shot 2018-08-08 at 5.19.25 PMThe user got an answer, but they were not really asking a question. They were just telling the system something about themselves. People can do this sometimes to initiate a conversation and make a connection.

Now to give it more of a human touch you might want the chat bot to acknowledge the statement before giving the answer.

First let’s understand what is a question. The question mark is the most obvious, but is not always the case. For this example, we will use some sample phrases that denote a possible question, and put them in an entity called @Question.

Screen Shot 2018-08-08 at 5.55.08 PM

You may be wondering is that I haven’t met every criteria to determine if it’s a question. Well you can add your own. 😉  But really worst case scenario, you should err on the side of answering an utterance as a question.

You can also improve question detection by building contextual entities from your intents.

Once you have this done, you can start on the dialog.

dialog_0818

(1) Create a folder that looks for the absence of a question entity using !@Question.

(2) For statements you are interested in, you can just look for the same intent. We acknowledge the statement, then continue on to find the answer.

Important! Always be data driven. What I mean is don’t just create statements for every single intent. Just do the ones that are exhibited by your end users, or have a meaning.

(3) It’s not possible to jump to a folder. So the first instinct is to jump to the first dialog node in the folder. This can actually cause problems if you add new intents at the top of the tree. So this dummy node is to force the flow into the next folder naturally.

When we run it this time we get the following:

 

answer_0818

… And there you have it. Very simple, but will add a more natural feeling. It can also make the user surprised (in a good way) when a chat bot goes slightly off script.

I’ve included a sample workspace you can play with.

Negation Annotation

So another tricky (and often a pain) with intelligent chat bots is the detection of negation.  For example:

Please remove all arugula from my prosciutto Pizza

Knowing what is not wanted in that question is normally quite hard. Contextual entities to the rescue again!

Somewhat different to the previous example, you not only need to train it the toppings but also what are not toppings. So we start off by creating a toppings entity.

toppings_2407.png

We now export that entity, change the CSV File so the entity name is @notoppings, then import it back in.

entities_2407.png

Next we create our intent #Order_Pizza and annotate what is and isn’t a topping. The reason for this is to prevent it trying to guess a topping that isn’t annotated.

intent_2407

So let’s test our question from earlier. You will notice that I did not add the mentioned ingredients. Nor did I have an example matching how the request is structured.

test_2407a

Pretty cool! 🙂

Although this worked quite well, I could see you are likely to require a couple of similar negation examples so that the contextual entities can train better. I wouldn’t say it is much work, but it is probably something you need to test a bit more to ensure you don’t have edge cases.

 

Annotate it

So this was an interesting problem that was posed to me. Take the following intent below.

intentlist_2307.png

This intent will try to detect where someone is asking to select results by criteria. Next up let’s create the entities based on the intents. I will be using the original method of creating entities. You end up with this.

entities_2307.png

So let’s test this out…

test1_2307.png

Oh dear! It is seeing “it” as “IT Department”. This is not good.

Thankfully Watson Assistant just recently got Contextual entities. The new engine is able to understand the nature of what the entity really means, as long as you annotate it.

So going into the intent again, I have selected each word and marked it up like so:

annotate_2307

Now let’s test it again.

test2_2307.png

Now it understands that it is not the IT department. Let’s try again.

test3_2307.png

Woah!

It not only worked, but it created a new entity on the fly.

So once you teach it the patterns, it will capture the entities for you. This is currently on by default, but you should be able to toggle soon.

You still have to train it the different patterns you see. For example with the work I have done so far “Filter sales by marketing” will pick up marketing and sales. You would have to build an annotation to show what is the important term in that sentence.

Finally proper intelligence on your entities to augment your intents.

… Edit …
So someone asked what about “IT” as a department? That works too.

samplerun

Visualising Intents

I’ve always used Pandas for getting an overview of intents, but when you are dealing at the enterprise level ( > 300 intents ), it can be a case of not being able to see the wood for the trees.

Recently I saw a nice mind map visualising intent structures (shout out to Rahul! 🙂 ). It was a manual process and a lot of work put into it.

So I looked to see if we can automate this. XMind to the rescue! There is a Python library that allows you to create through code.

First I start by setting up. You can get the ctx and workspace details from your assistant.

import xmind
from xmind.core import workbook, saver
from xmind.core.markerref import MarkerId
from xmind.core.topic import TopicElement
from watson_developer_cloud import ConversationV1
from urllib.parse import urlparse, parse_qs
import pandas as pd
import os

ctx = {
    "url": "https://gateway-fra.watsonplatform.net/assistant/api",
    "username": "USERNAME",
    "password": "PASSWORD"
}

version = '2018-07-10'
workspace = 'WORKSPACE'

xmind_file = 'intents.xmind'

The XMind library will create a file if it doesn’t exist. But if the file already exists, then it adds to it. So we need to delete it before we continue.

if os.path.exists(xmind_file): os.remove(xmind_file)

This next piece of code allows you to capture all the intents directly from the workspace. In a large scale workspace, you will generally have pages of intents, so this handles that.

wa = ConversationV1( username=ctx.get('username'), password=ctx.get('password'), version=version, url=ctx.get('url'))

j = []
x = { 'pagination': 'DUMMY' }
cursor = None
while 'pagination' in x:
    x = wa.list_intents(workspace_id=workspace, export=True,cursor=cursor)
    j.append(x['intents'])
    if 'pagination' in x and 'next_cursor' in x['pagination']:
        cursor = x['pagination']['next_cursor']
    else:
        x = {}

recs = []
for i in j: 
    for k in i: 
        record = { 
            'intent': k['intent'],
            'total': len(k['examples'])
        }
        recs.append(record)

df = pd.DataFrame(recs,columns=['intent','total'])
df = df.sort_values(by=['intent'])

This last piece of code takes the dataframe created with the question and intent, then turns it into a MindMap. Each node will display the intent name and how many examples in that intent. For intents >20 it will have a green star, while <10 will have a red star.

I am also using the first word before the underscore as the category.

x = xmind.load(xmind_file)

sheet = x.getPrimarySheet()
sheet.setTitle('Intents Summary')

root = sheet.getRootTopic()
root.setTitle('Intents')

current_id = None
for index, row in df.iterrows():
    id = row['intent'].split('_')[0]
    intent = '{} ({})'.format(row['intent'].replace('{}_'.format(id),''),row['total'])

    if id != current_id:
        topic = root.addSubTopic()
        current_id = id
        topic.setTitle(id)

    item = topic.addSubTopic()
    item.setTitle(intent)

    if row['total'] > 20:
        item.addMarker(MarkerId.starGreen)
    elif row['total'] < 10:
        item.addMarker(MarkerId.starRed)

xmind.save(x, xmind_file)
print('All done!')

Using the catalog intents as an example (and intentionally modifying/removing some) you end up with something like this:

Screen Shot 2018-07-22 at 22.51.46

You can build a more complex one with the examples as well, but when you are dealing with 1000’s of questions, it gets a little unwieldy.

What is your name revisited.

As I mentioned in my previous post, Watson Assistant has a system entity called @sys-name, which allows you to capture a persons name. One issue with this is that it is not available for every language.

In the original post I mention using entity extraction. You can still do this, but the cloud functions feature makes this so much easier.

The instructions for doing this are very well documented, so I intentionally skip over bits. Please use this as a reference.

First I created a Cloud function Action with the following code:

import sys
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 import Features, EntitiesOptions, KeywordsOptions

nlu = NaturalLanguageUnderstandingV1(
version='2017-02-27',
url='https://gateway-fra.watsonplatform.net/natural-language-understanding/api',
username='USERNAME',
password='PASSWORD')


def main(dict):
    rsp = nlu.analyze(text=dict['input'], features=Features(entities=EntitiesOptions()))

    username = ''
    company = ''

    for entity in rsp['entities']:
        if entity['type'] == 'Person':
            username = entity['text']
        elif entity['type'] == 'Company':
            company = entity['text']

    response = { 
        'name': username, 
        'company': company 
    }

    return response

On the parameters page I set up two parameters “input” and “language“. The language tag is to allow to use different languages where @sys-person may not exist.

On the end point page, you need to copy the API key and break into name:password as per the instructions link. Keep a note of it.

Now in Watson Assistant create a node that triggers and the following json code. Replace username/password with one from cloud function. Alternatively use the proper credentials formatting.

{
    "context": {
        "mycreds": {
            "user": "USERNAME",
            "password": "PASSWORD"
        },
        "nlu_response": ""
    },
    "output": {
        "text": {
            "values": [],
            "selection_policy": "sequential"
        }
    },
    "actions": [
     {
        "name": "simon_test_area/nlu_lookup",
        "type": "server",
        "parameters": {
        "input": "<? input.text ?>",
        "language": "en"
     },
    "credentials": "$mycreds",
    "result_variable": "$nlu_response"
    }
    ]
}

This will execute the cloud function and return the name and company (if they exist). Have this node skip to a child node which will execute the response. For my sample I have:

Name: $nlu_response.name<br>Company: $nlu_response.company

This is what you get back.

Screen Shot 2018-07-22 at 22.25.17

Very simple and very powerful. Combine this with Watson Knowledge Studio and you can build intelligence for your domain.

Six months later…

I had planned to create an update frequently, but life and more importantly work got in the way. With the new role, a lot of focus is on the other aspects of AI related technologies.

Of course while things are different for me, the chat bot world continues on. Watson Conversation becomes Watson Assistant. With a huge number of updates and changes.

My blog continues to be a source for numerous people starting with Watson Assistant. But Watson development have made changes that makes most redundant.

So until my next update (soon, I promise) let me give a brief update to every blog entry and give Watson Development the credit they deserve.

Note: I mention Beta a lot. In your workspace UI, you now are able to request access to the Watson Assistant Beta, and try out all the new features that are coming. I also didn’t reference every blog entry. If it’s not in the list it’s probably still the same, or not important enough to mention.

Testing your Intents

K-Fold and blind testing is still very much a part of ensuring you have trained the system well. But for those who are currently playing with the Beta will know, there are features coming that reduce this need or even make it redundant (jury is still out on this. I’m favoring the latter).

There are number of Watson Assistant K-Fold and testing apps up on github, if you don’t want to try decipher my example.

Pushing my Buttons

While I can see a need for buttons in some cases, I still believe it destroys the conversational aspect of a chat bot. Overuse turns your chat bot into an app, and damages training. I am also against having in a conversational text response.

Chihuahua or Muffin, revisited.

Huge updates have been made to Watson Visual Recognition (VisRec). The main being it is now integrated into Watson Studio. If you haven’t tried Watson Studio yet, go now! 🙂 It is a Data Science / Machine Learning / Deep Learning development platform.

Watson VisRec still continues to amaze people at how quickly and accurately it can classify custom content, we also have Watson Media which can annotate live video. If RAW POWER is needed, there is PowerAI Vision, which allows for real time classification on video. We are talking “Person of Interest” level classification. 🙂

I have no confidence in Entities.

I have nothing but love for Entities now! Gone are the “keyword” type entities. Now they are built using a ML NLP model. Not only that, it also helps to dramatically improve the training of Intents.

There are now Pattern Entities as well. These allow for complex regular expressions.

The design pattern of using entities to lower the confidence of an intent is still valid.

Manufacturing Intent

Creating manufactured questions is still very much an issue you would want to avoid.

That said, you may have seen Project Debater. While this is personal opinion (and no guarantee it will ever be a feature), I can see this technology augmenting the intent and answer creation of Watson Assistant.

Anaphora? I hardly knew her.

This is still very much a good and well used pattern. I’d love if Watson Development wrapped it into Watson Assistant, but for now it helps easily adding intelligence into your conversation.

Removing the confusion in Intents.

While this can help still in training, the current beta has features which will likely make this redundant.

Watson Conversation just got turned up to 11.

This should be “Watson Assistant” 😉

Slots have continued to improve. You can now create more complex slot responses and handlers. Allowing to jump to nodes within the slot itself, rather then create a conditional tree.

Digressions is also a new feature which allows you to dictate when the user can go off script, and pull them back.

I love Pandas

While I still love using Pandas, Watson Assistant logging analytics has improved dramatically. Again there are features coming in the Beta which make this even better for training your enterprise level chat bots.

For those in Premium IBM Cloud there is also a feature for it to recommend topics you have never seen before, and to help collate questions for intent training of new topics.

I have a Dream …

Since writing that blog post, Watson Speech to Text now uses Deep Learning to understand human speech. There is also a feature of “Acoustic models”, which allows you to train on accents in relation to your domain language.

Compound Questions

This is still a good pattern for compound questions. The current beta features may make this redundant in certain designs.

Improving your Intents with Entities.

With the huge improvements to Entities, I would consider this an anti-pattern. So ignore.

Conversing in three dimensions.

This is still a valid pattern. In fact I’ve seen it used in a number of very successful implementations. Again though, playing with the beta will show this may become redundant from a coding perspective.

Data Science Experience

It’s now “Watson Studio”! There are so many new features in this. The most important being the support of Deep Learning.

Since I wrote that blog you have the following new features (may be more).

  • SPSS modeler. Similar to the desktop version, but allows you to use the raw power of IBM cloud. You can still import/export your streams between the desktop version.
  • Data Refinery. There is a quote that 60% of all data science work is cleaning up data. This helps in doing this.
  • Watson Machine Learning. You give it your data, tell it what you want to use to determine the outcome. It will then run through multiple machine learning models, to find the best one that will work for your data. Something that requires a ML expert to do for even mundane stuff. It even creates the API and a test UI for you.
  • Experiments. Allows you to run numerous models and tweak to find the best one for your data.
  • Neural Network Modeler. Allows you to build your tensorflow / pytorch / keras / caffe models in a simple GUI. It will then write your code which you can export to your applications. Here is a good article on it.

Much more than this as well. Try it out, it’s free. 🙂

Building a Conversation interface in minutes.

Watson Assistant now has the ability to create a number of conversation interfaces through the workbench (little to no coding in some instances). For example, Facebook and Slack.

Understanding how a conversation flows

This is redundant. Actually I can argue that most of what I wrote in 2016 is redundant now.

Conversation now has Folders which allows you to check a branch of nodes, but then continue the flow of the tree, instead of falling back to root.

Nodes now have a “Skip Input” which means you don’t have to put a Jump to get into the branch (which is prone to breaking if you have to add more nodes)

Digressions allow you to jump around looking for the answer and return to where you left off.

… So there you have it. Hopefully everyone is up to date. 🙂 See ya soon.

Testing your intents

So this really only helps if you are doing a large number of intents, and you have not used entities as your primary method of determining intent.

First lets talk about perceived accuracy, and what this is trying to solve. Perceived accuracy is where someone will type in a few questions they know the answer to. Then depending on their manual test they perceive the system to be working or failing.

It puts the person training the system into a false sense of how it is performing.

If you have done the Watson Academy training for Conversation you will hear it mention K-fold testing. For this blog post, I’m going to skip the details as I briefly mentioned before.

K-fold cross validation : You split your training set into random segments (K). Use one set to test and the rest to train. You then work your way through all of them. This method will test everything, but will be extremely time-consuming. Also you need to pick a good size for K so that you can test correctly.

K-Fold works well by itself if you have a large training set that has come from a real world representative users. You will find this rarely happens. So you should use in conjunction with a blind.

Previously I didn’t cover how you actually do the test. So with that, here is the notebook giving a demonstration:

 

 

Pushing My Buttons

So this is a long time pet peeve, but recently I have seen a load of these in succession. I am sure a lot of people who know me are going to read this and think “He’s talking about me”. Truth is there is no one person I am pointing my finger at.

Let me start with what triggered this post. Have a look at this screen shot. There are three things wrong with it, although one of the reasons is not visible, but you can guess.

poorUXdesign

So disclosure, this is a competitors chat bot, it is also a common pattern I have seen on that chat bot. But I have also seen people do this with Watson Conversation.

Did you guess the issues? 

Issue 1: Never ask the end user did you answer them correctly or not. If your system is well trained, and tested then you are going to know if it answered well or not.

Those who think of a rebuttal to this, imagine you rang a customer support person and they asked you “Did I answer you correctly” every time they gave an answer? What would your action be? More than likely you would ask to speak to someone who does know what they are talking about.

If you really need to get feedback, make it subtle, or ask for a survey at the end.

Issue 2: BUTTONS. I don’t know who started this button trend, but it has to die. You are not building a cognitive conversational system. You are building an application. You don’t need an AI for buttons, any average developer can build you a button based “Choose your own adventure“.

Issue 3: Not visible on the image is that you are stuck until you click on yes or no. You couldn’t say yes or no, or “I am not sure”. For that matter I have seen cases where the answer is poorly written and the person would take the wrong answer as right, so what happens then?  For that matter selecting yes or no does nothing to progress the conversation.

So what is the root cause in all this?  From what I have seen normally it is one thing.

Developers.

Because older chat bots required a developer to build, it has sort of progressed along those lines for some time. In fact some chat bot companies tout the fact that it is developer orientated, and in some cases only offer code based systems.

I’ve also gotten to listen to some developers tell me how Watson Conversation sucks (because “tensor flow”), or they could write better. I normally tell them to try.

Realistically to make a good chat bot, the developer is generally far down the food chain in that creation. Watson conversation is targeted at your non-technical person.

Heres a little graphic to help.

beep beep robot

Now your chances of getting all these is hard, but the people you do get should have some skills in these areas. Let’s expand on each one.

Business Analyst

By far the most important, certainly at the start of the project.

Most failed chat bot projects are because someone who knows the business hasn’t objectively looked at what it is you are trying to solve, and if it is even worth the time.

By the same token, I have seen two business analysts create a conversational bot that on the face of it looked simple, but they could show that it saved over a million euros a year. All built in a day and a half. Because they knew the business and where to get the data.

Conversational Copywriter

Normally even getting a copywriter makes a huge difference, but one with actual conversational experience makes the solution shine. It’s the difference between something clinical, and something your end user can make an emotional attachment to.

Behavioural Scientist

Another thing I see all the time. You get an issue in the chat conversation that requires some complexity to solve. So you have your developer telling you how they can build something custom and complex to solve the issue (probably includes tensor flow somewhere in all of it).

Your behavioural expert on the other hand will suggest changing the message you tell the end user. It’s really that simple, but often missed by people without experience in this area.

Subject Matter Expert (SME)

To be fair, at least on projects I’ve seen there is normally an SME there. But there are still different levels of SMEs. For example your expert in the material, may not be the expert that deals with the customer.

But it is dangerous to think that just because you have a manual you can reference, that you are capable to building a system that can answer questions as if it is an SME.

Data Scientist

While you might not need a full blown one, all good conversational solutions are data driven. In what people ask, behaviours exhibited and needs met. Having someone able to sift through the existing data and make sense of it, helps make a good system.

Also almost every engagement I’ve been on, people will tell you what they think the end user will say or do. But often it is never the case, and the data shows this.

UI/UX

What the conversational copywriter does for the engaging conversation, the UI/UX does for the system. If you are using existing channels like Facebook, Skype, Messenger, Slack, etc.. then you probably don’t need to worry as much. But it’s still possible to create something that can upset the user without good UX experience.

It’s also a broad skill area. For example, UX for Web is very different to Mobile, IVR, and Robots.

Machine Learning

Watson conversation abstracts the ML layer from the end user. You only need to know how to cluster questions correctly. But knowing how to do K-Fold cross validation, or the importance of blind sets helps in training the system well.

It also helps if your developers have at least a basic understanding of machine learning.

I often see non-ML developers trying to fix clusters with comments like. “It used this keyword 3 times, so that’s why it picked this over that”, which is not how it works at all.

It also prevents your developers (if they code the bot) to create something that is entity heavy. Non-ML Developers seem to like entities, as they can wrap their head around them. Fixed keywords, regex, all makes sense to a developer, but in the long run make the system unmaintainable (basically defeats the purpose of using Watson conversation).

Natural Language Processing (NLP)

I’ve made this the smallest. There was a time, certainly with the early versions of Watson you needed these skills. Not so much anymore. Still, it’s good to understand the basics of NLP, certainly for entities.

Developer

In the scheme of things, there will always be a place for the developer.

You have UI development, application layer, back-end and integration, automation, testing, and so on.

Just development skills alone will not help you in building something that the end user can feel a connection to.

… and please, stop using buttons. 

Chihuahua or Muffin, revisited.

I just finished reading Maria Yao‘s article Chihuahua OR Muffin? Searching For The Best Computer Vision API. It’s a fun read, but I felt it didn’t really show off the power of Watson Visual Recognition.

For the demo in the article, the general classifier was being used.

One of the main advantages of Watson Visual recognition is that you can create your own custom classifiers. It is very simple too.

First, you need data.

Using Marias article I pulled the Chihuahua and Muffin pictures from ImageNet.

Like most data, it tends to need a bit of cleaning. So I deleted any images below 14KB in size. The reason for this was the majority at that size were just corrupted. I also went through and deleted any images which were adverts or “this image is no longer there” banners.

Overall that was 500 images deleted totally. It still left 3,000 images to play with.

Next I created a Visual Recognition service. For this I created the free version. This limited me to 250 events a day. So I had to lower my training sets to 100 pictures from each set.

I took a random 100 from each. I didn’t examine the photos at all, but here are a few to give you an idea of how the images look.

example_training

As you can see, no thought put into worrying about other items in the picture.

I zipped the images up, and created a classifier like so.

classifier

Then I clicked create, and waited a little over 10 minutes for it to analyse the pictures.

Once the classifier was finished training, I was ready to test. As some may not be aware, Visual Recognition also offers a food classifier. So for my first two tests I tried my classifier, General and Food.

test1

So you can see the red bar on the classifier I made. This is more because I only gave 100 examples. As you give more training examples, it’s confidence increases. But you can see that the difference between Muffin and Chihuahua is clear.

You can also see the food classifier got it as well.

What about the Chihuahua?

test2

As you can see, all three do quite well on classification. But what about the original pictures which look similar? I ran those through and ended up with this.

results2

As you can see it got them all right! None of these were used to train against.

As demos go though this is simple and fun. But with good classified images, it can be scary accurate for proper real world use cases.

Having said that, I did have one failure. Testing the samples on Maria’s page, it was able to understand the cookie monster muffin and the man holding the chihuahua.

But the muffin in the plastic bag with the Chihuahua it could not get. I tried cropping out the dog, but it still failed but with a lower confidence. I suspect this is a combination of training and a bad quality photo.

Screen Shot 2017-09-28 at 17.40.17