Watson in the black and white room.

Let’s talk about the recent changes of how Watson determines it’s confidence. It seems to be a hot topic at the moment, and probably not best understood.

 

Before: 

Imagine that you are Watson, you are in a room with no doors or windows. You have learned everything about the world from Wikipedia. There is two objects, a cube and a pyramid in front of you.

Now if someone tells you a question, you can use Wikipedia to try and figure out what the answer is, but you can only point to one of the two objects in the room. There is no other answer.

So they may ask “Which one is an Orange?”. You may think that a cube is similar to the Discovery Cube in Orange county. You can also see that a food pyramid has an orange in it. Neither is a direct fit, but you only have two answers.

So you respond: “I am 51% sure that it is this pyramid”

After:

Now you are in the same room, but this time there is a window that shows you the outside world.

You are asked the same question. You still come to the same conclusion, but because you can see the outside world you know that the answer is not in the room.

This time you respond: “I am confident that neither of these objects are an Orange”

But what about the lower confidence?

The first thing you notice is that the confidence is not as high as before. This in itself is not a bad thing. It is the relationship of the answer to the others found. For example:

conv060217-2

You can see in this example the first answer is 72%, while the next one is 70%. So it is either a compound question, or you need training to differentiate between the two intents that are close together. In the previous version you could not see this.

The main point to take from this, the confidence hasn’t actually changed. You are just finally seeing the real confidence.

How does this impact me?

First Watson would always ignore an intent if the confidence is <0.2. But how the confidences were previously determined, it was rare that you would hit this condition.

Now this is possible.

Also if you have written conditions to determine the real confidence boundary (detailed here), you need to determine the correct boundaries.

Lastly if no intent is matched, the you get an empty intents list.

In closing

Although the new feature is considerably better, always test before you deploy!

As for the title reference: 

Compound Questions

One problem that is tricky to solve is if a user has asked two questions. Previously some solutions were to look for conjunctions (“and”) or question marks. Then try to guess if it is a question.

But you could end up with a question like “Has my dog been around other dogs and other people?”. This is clearly one question.

With the new conversation feature of “Absolute Confidences”, it is now possible to detect this. Earlier versions of conversation would have all intents would add up to 1.0.

Now each confidence has it’s own value. Taking the earlier example, if we map the confidences to a chart, we get:

conv060217-1

Visually we can see that the first and second intent are not related. The next sentence “Has my dog been around other dogs and is it certified?” is two questions. When we chart this we see:

conv060217-2

Very easy to see that there are two questions. So how to do it in your code?

You can use a clustering technique called K-means. This will cluster your data into sets of ‘K’. In this case we have “important intents” and “unimportant intents”. Two groups, means K = 2.

For this demonstration I am going to use Python, but K-means exists in a number of languages. I have a sample of the full code, and example conversation workspace. So for this I will only show code snippets.

Walkthrough

Conversation request needs to set alternate_intents to true. So that you can get access to the top 10 intents.

Once you get your response back, convert your confidence list into an array.

intent_confidences = list(o['confidence'] for o in response['intents'])

Next the main method will return True if it thinks it is a compound question. It requires numpy + scipy.

def compoundQuestion(intents):
    v = np.array(intents)
    codebook, _ = kmeans(v,2)
    ci, _ = vq(v,codebook)

    # We want to make everything in the top bucket to have a value of 1.
    if ci[0] == 0: ci = 1-ci
    if sum(ci) == 2: return True
    return False

The first three lines will take the array of confidences and generate two centroids. A centroid is the mean of each cluster found. It will then group each of the confidences into one of the two centroids.

Once it runs ci will look something like this: [ 0, 0, 1, 1, 1, 1, 1, 1, 1, 1 ] . This however can be the reverse.

The first value is the first intent. So if the first value is 0 we invert the array and then add up all the values:

[ 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 ] => 2 

If we get a value of 2, then the first two intents are related to the question that was entered. Any other value, then we only have one question, or potentially more than two important intents.

Example output from the code:

Has my dog been around other dogs and other people?
> Single intent: DOG_SOCIALISATION (0.9876400232315063)

Has my dog been around others dogs and is it certified?
> This might be a compound question. Intent 1: DOG_SOCIALISATION (0.7363447546958923). Intent 2: DOG_CERTIFICATION (0.6973928809165955).

Has my dog been around other dogs? Has it been around other people?
> Single intent: DOG_SOCIALISATION (0.992318868637085)

Do I need to get shots for the puppy and deworm it?
> This might be a compound question. Intent 1: DOG_VACCINATIONS (0.832768440246582). Intent 2: DOG_DEWORMING (0.49955931305885315).

Of course you still need to write code to take action on both intents, but this might make it a bit easier to handle compound questions.

Here is the sample code and workspace.