Watson in the black and white room.

Let’s talk about the recent changes of how Watson determines it’s confidence. It seems to be a hot topic at the moment, and probably not best understood.



Imagine that you are Watson, you are in a room with no doors or windows. You have learned everything about the world from Wikipedia. There is two objects, a cube and a pyramid in front of you.

Now if someone tells you a question, you can use Wikipedia to try and figure out what the answer is, but you can only point to one of the two objects in the room. There is no other answer.

So they may ask “Which one is an Orange?”. You may think that a cube is similar to the Discovery Cube in Orange county. You can also see that a food pyramid has an orange in it. Neither is a direct fit, but you only have two answers.

So you respond: “I am 51% sure that it is this pyramid”


Now you are in the same room, but this time there is a window that shows you the outside world.

You are asked the same question. You still come to the same conclusion, but because you can see the outside world you know that the answer is not in the room.

This time you respond: “I am confident that neither of these objects are an Orange”

But what about the lower confidence?

The first thing you notice is that the confidence is not as high as before. This in itself is not a bad thing. It is the relationship of the answer to the others found. For example:


You can see in this example the first answer is 72%, while the next one is 70%. So it is either a compound question, or you need training to differentiate between the two intents that are close together. In the previous version you could not see this.

The main point to take from this, the confidence hasn’t actually changed. You are just finally seeing the real confidence.

How does this impact me?

First Watson would always ignore an intent if the confidence is <0.2. But how the confidences were previously determined, it was rare that you would hit this condition.

Now this is possible.

Also if you have written conditions to determine the real confidence boundary (detailed here), you need to determine the correct boundaries.

Lastly if no intent is matched, the you get an empty intents list.

In closing

Although the new feature is considerably better, always test before you deploy!

As for the title reference: 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s