Compound Questions

One problem that is tricky to solve is if a user has asked two questions. Previously some solutions were to look for conjunctions (“and”) or question marks. Then try to guess if it is a question.

But you could end up with a question like “Has my dog been around other dogs and other people?”. This is clearly one question.

With the new conversation feature of “Absolute Confidences”, it is now possible to detect this. Earlier versions of conversation would have all intents would add up to 1.0.

Now each confidence has it’s own value. Taking the earlier example, if we map the confidences to a chart, we get:

conv060217-1

Visually we can see that the first and second intent are not related. The next sentence “Has my dog been around other dogs and is it certified?” is two questions. When we chart this we see:

conv060217-2

Very easy to see that there are two questions. So how to do it in your code?

You can use a clustering technique called K-means. This will cluster your data into sets of ‘K’. In this case we have “important intents” and “unimportant intents”. Two groups, means K = 2.

For this demonstration I am going to use Python, but K-means exists in a number of languages. I have a sample of the full code, and example conversation workspace. So for this I will only show code snippets.

Walkthrough

Conversation request needs to set alternate_intents to true. So that you can get access to the top 10 intents.

Once you get your response back, convert your confidence list into an array.

intent_confidences = list(o['confidence'] for o in response['intents'])

Next the main method will return True if it thinks it is a compound question. It requires numpy + scipy.

def compoundQuestion(intents):
    v = np.array(intents)
    codebook, _ = kmeans(v,2)
    ci, _ = vq(v,codebook)

    # We want to make everything in the top bucket to have a value of 1.
    if ci[0] == 0: ci = 1-ci
    if sum(ci) == 2: return True
    return False

The first three lines will take the array of confidences and generate two centroids. A centroid is the mean of each cluster found. It will then group each of the confidences into one of the two centroids.

Once it runs ci will look something like this: [ 0, 0, 1, 1, 1, 1, 1, 1, 1, 1 ] . This however can be the reverse.

The first value is the first intent. So if the first value is 0 we invert the array and then add up all the values:

[ 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 ] => 2 

If we get a value of 2, then the first two intents are related to the question that was entered. Any other value, then we only have one question, or potentially more than two important intents.

Example output from the code:

Has my dog been around other dogs and other people?
> Single intent: DOG_SOCIALISATION (0.9876400232315063)

Has my dog been around others dogs and is it certified?
> This might be a compound question. Intent 1: DOG_SOCIALISATION (0.7363447546958923). Intent 2: DOG_CERTIFICATION (0.6973928809165955).

Has my dog been around other dogs? Has it been around other people?
> Single intent: DOG_SOCIALISATION (0.992318868637085)

Do I need to get shots for the puppy and deworm it?
> This might be a compound question. Intent 1: DOG_VACCINATIONS (0.832768440246582). Intent 2: DOG_DEWORMING (0.49955931305885315).

Of course you still need to write code to take action on both intents, but this might make it a bit easier to handle compound questions.

Here is the sample code and workspace.

2 thoughts on “Compound Questions

  1. I’m a bit of a newbie to this, so apologies if my understanding is incorrect. Is it possible for vq to return something like [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]? In this case, it should recognize it as potentially a compound question, but it will invert and not see it.

    • In this case the first 8 intents are related. It would likely point more to poor training than a real compound question.

Leave a Reply