Pushing My Buttons

So this is a long time pet peeve, but recently I have seen a load of these in succession. I am sure a lot of people who know me are going to read this and think “He’s talking about me”. Truth is there is no one person I am pointing my finger at.

Let me start with what triggered this post. Have a look at this screen shot. There are three things wrong with it, although one of the reasons is not visible, but you can guess.

poorUXdesign

So disclosure, this is a competitors chat bot, it is also a common pattern I have seen on that chat bot. But I have also seen people do this with Watson Conversation.

Did you guess the issues? 

Issue 1: Never ask the end user did you answer them correctly or not. If your system is well trained, and tested then you are going to know if it answered well or not.

Those who think of a rebuttal to this, imagine you rang a customer support person and they asked you “Did I answer you correctly” every time they gave an answer? What would your action be? More than likely you would ask to speak to someone who does know what they are talking about.

If you really need to get feedback, make it subtle, or ask for a survey at the end.

Issue 2: BUTTONS. I don’t know who started this button trend, but it has to die. You are not building a cognitive conversational system. You are building an application. You don’t need an AI for buttons, any average developer can build you a button based “Choose your own adventure“.

Issue 3: Not visible on the image is that you are stuck until you click on yes or no. You couldn’t say yes or no, or “I am not sure”. For that matter I have seen cases where the answer is poorly written and the person would take the wrong answer as right, so what happens then?  For that matter selecting yes or no does nothing to progress the conversation.

So what is the root cause in all this?  From what I have seen normally it is one thing.

Developers.

Because older chat bots required a developer to build, it has sort of progressed along those lines for some time. In fact some chat bot companies tout the fact that it is developer orientated, and in some cases only offer code based systems.

I’ve also gotten to listen to some developers tell me how Watson Conversation sucks (because “tensor flow”), or they could write better. I normally tell them to try.

Realistically to make a good chat bot, the developer is generally far down the food chain in that creation. Watson conversation is targeted at your non-technical person.

Heres a little graphic to help.

beep beep robot

Now your chances of getting all these is hard, but the people you do get should have some skills in these areas. Let’s expand on each one.

Business Analyst

By far the most important, certainly at the start of the project.

Most failed chat bot projects are because someone who knows the business hasn’t objectively looked at what it is you are trying to solve, and if it is even worth the time.

By the same token, I have seen two business analysts create a conversational bot that on the face of it looked simple, but they could show that it saved over a million euros a year. All built in a day and a half. Because they knew the business and where to get the data.

Conversational Copywriter

Normally even getting a copywriter makes a huge difference, but one with actual conversational experience makes the solution shine. It’s the difference between something clinical, and something your end user can make an emotional attachment to.

Behavioural Scientist

Another thing I see all the time. You get an issue in the chat conversation that requires some complexity to solve. So you have your developer telling you how they can build something custom and complex to solve the issue (probably includes tensor flow somewhere in all of it).

Your behavioural expert on the other hand will suggest changing the message you tell the end user. It’s really that simple, but often missed by people without experience in this area.

Subject Matter Expert (SME)

To be fair, at least on projects I’ve seen there is normally an SME there. But there are still different levels of SMEs. For example your expert in the material, may not be the expert that deals with the customer.

But it is dangerous to think that just because you have a manual you can reference, that you are capable to building a system that can answer questions as if it is an SME.

Data Scientist

While you might not need a full blown one, all good conversational solutions are data driven. In what people ask, behaviours exhibited and needs met. Having someone able to sift through the existing data and make sense of it, helps make a good system.

Also almost every engagement I’ve been on, people will tell you what they think the end user will say or do. But often it is never the case, and the data shows this.

UI/UX

What the conversational copywriter does for the engaging conversation, the UI/UX does for the system. If you are using existing channels like Facebook, Skype, Messenger, Slack, etc.. then you probably don’t need to worry as much. But it’s still possible to create something that can upset the user without good UX experience.

It’s also a broad skill area. For example, UX for Web is very different to Mobile, IVR, and Robots.

Machine Learning

Watson conversation abstracts the ML layer from the end user. You only need to know how to cluster questions correctly. But knowing how to do K-Fold cross validation, or the importance of blind sets helps in training the system well.

It also helps if your developers have at least a basic understanding of machine learning.

I often see non-ML developers trying to fix clusters with comments like. “It used this keyword 3 times, so that’s why it picked this over that”, which is not how it works at all.

It also prevents your developers (if they code the bot) to create something that is entity heavy. Non-ML Developers seem to like entities, as they can wrap their head around them. Fixed keywords, regex, all makes sense to a developer, but in the long run make the system unmaintainable (basically defeats the purpose of using Watson conversation).

Natural Language Processing (NLP)

I’ve made this the smallest. There was a time, certainly with the early versions of Watson you needed these skills. Not so much anymore. Still, it’s good to understand the basics of NLP, certainly for entities.

Developer

In the scheme of things, there will always be a place for the developer.

You have UI development, application layer, back-end and integration, automation, testing, and so on.

Just development skills alone will not help you in building something that the end user can feel a connection to.

… and please, stop using buttons. 

Anaphora? I hardly knew her.

One of common requests for conversation is being able to understand the running topic of a conversation.

For example:

USER: Can I feed my goldfish peas?

WATSON: Goldfish love peas, but make sure to remove the shells!

USER: Should I boil them first?

The second response “them” is called an “anaphora”. The “them” refers to the peas. So you can’t answer the question without first knowing the previous question.

On the face of it, it looks easy. But you have “goldfish”, ‘peas”, ‘shells” which could potentially be the reference, and no one wants to boil their goldfish!

So the tricky part is determining the topic. There are a number of ways to approach this.

Entities

The most obvious way is to determine what entity the person mentioned, and store that for use later. This works well if the user actually mentions an entity to work with. However in a general conversation, the subject of the conversation may not always be by the person who asks the question.

Intents

When asking a question and determining the intent, it may not always be that an entity can be involved. So this has limited help in this regards.

That said, there are certain cases where intents have been used with a context in mind. So it can be easily done by creating a suffix to the intent. For example:

    #FEEDING_FISH_e_Peas

In this case we believe that peas is a common entity that has a relationship to the intent of Feeding Fish. For coding convention we use “_e_” to denote that the following piece of the intent name is an entity identifier.

At the application layer, you can do a regex on the intent name “_e_(.*?)$” for the group 1 result. If it is not blank, store it in a context variable.

 

Regular Expressions

Like before, you can use regular expressions to capture an earlier pattern to store it at a later point.

One way to approach this is have a gateway node that activates before working through the intent tree. Something like this:

example1507

The downside to this is that there is a level of complexity to maintain in a complex regular expression.

You can make at least maintaining a little easier by setting the primary condition check as “true” and then individual checks in the node itself.

example1507a

Answer Units

An answer unit is the text response you give back to the end user. Once you have responded with an answer, you have created a lot of context within that answer that the user may follow up on. For example:

example1507b

Even with the context markers of the answer, the end user may never pick up on them. So it is very important to craft your answer that will drive the user to the context you have selected.

NLU

The last option is to pass the questions through NLU. This should be able to give you the key terms and phrases to store as context. As well as create knowledge graph information.

I have the context. Now what?

When the user gives a question that does not have context, you will normally get back low confidence intents, or irrelevant response.

If you are using Intent based context, you can check the returning intents for a similar context to what you have stored. This also allows you to discard unrelated intents. The results from this are not always stellar, but offer a cheaper one time call.

The other option you can take is to preload the question that was asked and send it back. For example:

  PEAS !! Can I boil them first?

You can use the !! as a marker that your question is trying to determine context. Handy if you need to review the logs later.

As time passes…

So as the conversation presses on, what the person is talking about can move away from the original context, but it may still remain the dominant. One solution is to build a weighted context list.

For example:

"entity_list" : "peas, food, fish"

In this case we maintain the last three context found. As a new context is found, it uses LIFO to maintain the list. Of course this means more API calls, which can cost money.

Lowering calls on the tree.

Another option in this to create a poor mans knowledge graph. Let’s say the last two context were “bowl” and “peas”. Rather then creating multiple context nodes, you can build a tree which can be passed back to the application layer.

"entity" : "peas->food->care->fish"
...
"entity" : "bowl->care->fish"

You can use something like Tinkerpop to create a knowledge graph (IBM Graph in Bluemix is based on this).
example1507c

Now when a low confidence question is found, you can use “bowl”, “peas” to disambiguate, or use “care” as the common entity to find the answer.

Talk… like.. a… millennial…

One more common form of anaphora that you have to deal with, is how people talk on instant messaging systems. The question is often split across multiple lines.

example1507d

Normal conversation systems take one entry, and give one response. So this just wreaks their AI head. Because not only do you need to know where the real question stops, but where the next one starts.

One way to approach this is capture the average timing mechanism between each entry of the user. You can do this by passing the timestamps from the client to the backend. The backend can then build an average of how the user talks. This needs to be done at the application layer.

Sadly no samples this time, but it should give you some insights into how context is worked with a conversation system.

Speech to Text and Conversation

I thought I would take a moment to play with Speech to Text and a utility that was released a few months ago.

The Speech to Text Utils allows you to train S2T using your existing conversational system. To give a quick demo, I got my son to ask about buying a puppy.

I set up some quick Python code to print out results:

import json
from watson_developer_cloud import SpeechToTextV1

# ctx is Service credentials copied from S2T Service. 

s2t = SpeechToTextV1(
 username=ctx.get('username'),
 password=ctx.get('password')
)

def wav(filename, **kwargs):
  with open(filename,'rb') as wav:
    response = s2t.recognize(wav, content_type='audio/wav', **kwargs)

if len(response['results']) > 0: 
  return response['results'][0]['alternatives'][0]['transcript']
else:
  return '???';

So testing the audio with the following code:

wav_file = 'p4u-example1.wav'
print('Broadband: {}'.format(wav(wav_file)))
print('NarrowBand: {}'.format(wav(wav_file,model='en-US_NarrowbandModel')))

Gets these results:

Broadband: can I get a puppy 
NarrowBand: can I get a puppy

Of course the recording is crystal clear, which is why such a good result. So I added some ambient noises from SoundJay to the background. So now it sounds like it is in a subway.

Running the code above again get’s these results.

Broadband: Greg it appropriate 
Narrowband: can I get a phone

Ouch!

Utils to the rescue!

So the purpose of asking about a puppy is that I have a sample conversation system that is about buying a dog. Using that conversation file I did the following.

1: Installed Speech to Text Utils.

2: Before you begin you need to set up the connection to your S2T service (using service credentials).

watson-speech-to-text-utils set-credentials

It will walk you through the username and password.

3: Once that was set up, I then tell it to create a customisation.

watson-speech-to-text-utils corpus-from-workspace puppies4you.json

You need to map to a particular model. For testing, I attached it to en-US_NarrowbandModel and en-US_BroadbandModel.

4: Once it was run, I get the ID numbers for the customisations.

watson-speech-to-text-utils customization-list

Once I have the ID’s I try the audio again:

wav_file='p4u-example2.wav'
print('Broadband: {}'.format(wav(wav_file,customization_id='beeebd80-2420-11e7-8f1c-176db802f8de',timestamps=True)))
print('Narrowband: {}'.format(wav(wav_file,model='en-US_NarrowbandModel',customization_id='a9f80490-241b-11e7-8f1c-176db802f8de')))

This outputs:

Broadband: can I get a puppy 
Narrowband: can I get a phone

So the broadband now works. Narrowband is likely the quality is too poor to work with. There is also more specialised language models for children done by others to cope with this.

One swallow does not make a summer.

So this is one example, of one phrase. Really for testing, you should test the whole model. From a demonstration from development, it was able to increase a S2T model accuracy from around 50% to over 80%.

 

 

Improving your Intents with Entities.

You might notice that when you update your entities that Conversation says “Watson is training on your recent changes”. What is happening is that Intents and Entities work together in the NLU engine.

So it is possible to build entities that can be referenced within your intents. Something similar to how Dialog entities. Work.

For this example I am going to use two entities.

  • ENTITY_FOODSTUFF
  • ENTITY_PETS

In my training questions I create the following example.

conv-150118-1

The #FoodStore question list is exactly the same, only the entity name is changed.

Next up create your entities. It doesn’t matter what the entity itself is called, only that if has one value that mentions the entity identifiers above. I have @Entity set to the same as the value for clarity.

conv-150118-2

conv-150118-3

 

“What is the point?” you might ask? Well you will notice that both entities have a value of “fish”.

When I ask “I want to get a fish” I get the following back.

  • FoodStore confidence: 0.5947581078492985
  • Petshop confidence: 0.4052418921507014

So Watson is not sure, as both intents could be the right answer. This is what you would expect.

Now after we delete the “fish” value from both entities, I then add the same training question “I want a fish” to both intents. After Watson has trained and I ask “I want to get a fish”, you get the following back.

  • Petshop confidence: 0.9754140796608233
  • FoodStore confidence: 0.02458592033917674

Oh dear, now it appears to be more confident then it should be. So entities can help in making questions ambiguous if training is not helping.

This is not without it’s limitations.

Entities are fixed keywords, and the intents will treat them as such. So while it will find “fish” in our example, it won’t recognise “fishes” unless it’s explicitly stated in the entity.

Another thing to be wary of is that all entities are used in the intents. So if a question mentioned “toast”, then @ENTITY_FOODSTUFF becomes a candidate in trying to determine which intent is correct.

The last thing to be aware of is that training questions take priority over entities when it comes to determining what is correct.

If we were to add a training question “I want fishes” to the first example. Then ask the earlier question, you would find that foodstore now takes priority. If we add “I want fishes” to both intents and ask the question “I want to get a fish”, you will get the same results as if the entities never had the word “fish” in it.

This can be handy for forcing common spelling mistakes that may not be picked up, or clearly defined domain keywords a user may enter (eg. product ID)

 

Prioritizing Intents

A common question that comes up is how to handle where the end user makes two utterances, but you only want to take action on one.

The most common being someone saying hello, versus saying hello with a question. You would want the question to take priority.

It’s very easy to do. You just do the following:

conv-150117-0

  1. Create your first node with a condition of True and create your priority intents under this. Set your top node to jump to the first in that branch.
  2. Create your second node which handles greetings.
  3. Add a True node at the end of your important intents, and let it jump to greeting condition.

And that’s it! But that is the old style conversation way. Just before the new year a new version of conversation was released that makes this so much more simple.

conv-150117-1

The magic is in the first node.

conv-150117-2

Here we check to ensure that a greeting hasn’t been mentioned, then check each important intent.

With this method you don’t need any complex branches or jumping around. One important thing is to ensure that your less important intents do not have any training data that may cause it to pick it over the important intents.

Sample workspaces available.

 

 

Conversing in three dimensions.

There is one feature of Conversation that many people don’t even factor in when creating a conversational system. Let’s take the standard plan to spell it out.

  • Unlimited API queries/month
  • Up to 20 workspaces
  • Up to 2000 intents
  • Shared public cloud

Yep, you have 20 workspaces to play with! Most people starting off just use it for development, testing and production. But there is so much more. Putting it in context.

complexityofconversation

Functional Actions

For those new to Conversation, the first experience is normally the car demo. This is a good example of functional actions. In this case you know your application, and you want your end user to interact with it. So the user normally has prompts (conversational or visual) to allow them to refer to your user interface.

These offer the least resistance of building. The user is taught the names of the interfaces by using it, and are unlikely to deviate from that language. In fact if they do use non-Domain language, it is more likely a fault of your user interface.

Question & Answers

This is where you have collected questions from the end user, to determine what the answer/action that is needed to be taken.

Often the end user does not understand the domain language of the documentation or business. So training on their language helps making the system better for them.

Process Flows

This is where you need to converse with the user, to collect more information to drive to meeting their needs.

Multiple Workspaces

Most see this as just creating a workspace for development, testing and production. But using these as part of your overall architecture can dramatically increase the functionality and accuracy of the system.

Two main patterns that have been used are off-topic + drill down.

Off-Topic / Chit chat.

In this model we have a primary workspace which stores the main intents, as well as an intent for off-topic + chit-chat. Once one of these is detected, a second call is made out to the related workspace.

uml0117

From a price point of view this works well if you are calling out to a subject matter that is asked infrequently. If the user will often ask these questions though, then the drill down method is a better solution.

Drill Down.

This model is where the user asks a question which has a more expanded field going forward. For example when you enter a bank you may ask the information desk about mortgages, who will direct you to another desk to go into more detail on your questions.

uml0117a

For this to work well, you need clear separation of what each workspace does. So that an off topic is triggered so as to pass back to the main workspace.

When planning your model look for common processes vs entities. The example above might not be good to separate by pets, as they will share common questions with a different entity. But you could separate between purchasing, accessories, etc.

As long as the conversation will not switch topics often then costs are kept down.

Multiple Workspace Calls.

This is not a recommended model as your costs go way up. It was originally used when there was a 500 intent limit per space (NLC).

uml0117b

If money is no object, then this model works where you may have more then 2000 intents, or a number of intents that share similar patterns but you need to distinguish between them.

You need to factor in if your conversation service is returning relative or absolute confidences. If relative, then responses are relative to their workspace and not to each other.

If you do have numerous intents, it may be easier and better to use a different solution, like Discovery Service for example.

 

Message shaping

While the cognitive ability of Conversation is what sets it apart from other chat bots, the skills in message shaping can even the odds. It is a common technique used in customer support when you need to give a hard message.

From a Conversation point of view, it allows you to dramatically reduce the level of work required to build and maintain. As well as improving customer satisfaction.

So let’s start with what is message shaping. Take this example: You own a pet store chat bot and the user says:

I wish to buy a pet.

You can start with “What kind of pet?”. Here you have left the users response too open. For a cognitive system, this on the face of it isn’t an issue as a well trained Conversation will handle this well.

But even if it does it still leaves you open to a lot of responses.

  1. “What pets are there?” – Simple answer. Minimal work.
  2. “What pet suits me?” – A process flow to determine this. Medium work.
  3. “I want to buy a panda” – An endangered species. Again possible simple answer.
  4. “I want to buy a puppy” – More details required. Medium work.
  5. “I haven’t thought about it” – Could be simple or complex.
  6. “A pet that is popular for millennials” – Now it starts getting crazy.

You will be driven to insanity trying to cater for every possible response coming back from the user. Even if you get a good response like “I want to buy a puppy” you may need to walk through to and fro, only to find that you don’t have that pet in the store to sell them.

So you can reduce complexity by taking control of the conversation. First you need to examine what the user hasn’t said. They haven’t said what kind of pet. This means they are unsure on what they want to get.

As a pet store owner, you know that certain pets are good for certain living conditions. So you can reduce and control the direction by saying something like:

I see you are maybe unsure about the pet you want. I can recommend a low maintenance pet, which are good for apartments or busy lifestyles. Or a high maintence pet which is good for families with a home.

Here you have given two options which dramatically narrow the scope of the next response from the user. It is still possible someone may go off script, but it is unlikely. If they do you can do a conversational repair to force them back into the script.

As the flow progresses, you can push the user to pets that you are trying to sell faster.

In doing so however it is important to understand that the end user must have free will, even if it is an illusion. For example if the person wants a puppy, it may be that a certain breed is not available. Rather then saying they can’t have that breed, offer other breeds.

If you give no options to the user, it leads to frustration. Even if given options which are not what the person wants, it is still better then no option. Actually if you shape your messages well you can give two options which lead to the same outcome, and the end user will still feel like they are in control.

Shaping through UI

Another advantage of message shaping is avoiding having to code complex language rules (Intents, Entities, Regex, etc).

For example:

conv-3010a

Now you can see that the end user has not supplied all credit card information. You would need to code complex flows to cater for this. The information is clearly visible, and it could become a nightmare to parse to have it anonymised.

To solve all of this you can use the UI to force the user to structure their own data.

conv-3010b

Watson Virtual Agent actually does this out of the box for a number of common areas.

Buttons are for apps, not for conversing.

For UI related prompts, try not to overdo it. For structured data it is fine. For buttons it can also be fine, but if you overdo it then it does not feel intelligent to the end user. As it starts to feel more like an application, the users have different expectations of the responses they get back.

Practise makes perfect

Don’t be fooled that this is easy to do. Most developers I have seen work with conversation fall back to looking for a technical solution, when only changing how you speak to the end user will suffice.

Even people working in support can take 6 months to a year to pick up the skills from no experience. Although it can be a bit harder having to do it on the fly versus creating a script.

For more reading on techniques, here is some stuff to get you started.

New System Entities feature

If you have gone into your conversation service since yesterday, you will find you have a new feature called System Entities. For the moment you only have number, percentage  and currency recognition.

For this introduction I am just going to use @sys-number entity to make Watson do simple math.

First let’s train Watson about the math terms we are going to use. For this we are going to use intents.

conv-2210

Why intents and not entities? Hands down Intents will mean very little training for it to understand similar terms. Also as system entities, are also entities they can interfere with any logic you put in. I use the number “42” so as to not bias the classification to a particular number.

Next we go to entities and switch on the @sys-number entity.

Now for dialog, first we want to make sure what the person has said is a valid math question, if not we say we don’t understand. We do it with the following conditional statement.

intents.size() >0 
AND intents[0].confidence < 0.30

This will ensure that the system only responds if it is confident to do so. Next we put another node in which checks to see if the system is unsure.

intents.size() >0 
AND intents[0].confidence < 0.60
AND entities.size() == 2

Now you will notice we are using entities.size(). This is because the numbers are entities, and @sys-number doesn’t have the size() method. We want to make sure that the end user typed in two numbers before continuing.

Now what we have done that, the procedure is more or less the same for each action, so here is just the addition.

The answer is <? @sys-number[0].numeric_value + @sys-number[1].numeric_value ?>

This takes the first and second numeric value and adds them. While conversation will recognise numbers from text and take action on them, it won’t always do this. So we have to use the numeric_value attribute.

While mostly fine, there are issues you won’t be able to easily cater for. For example the importance of the numbers location.

Take for the example the two questions which are the same, but will give very different answers.

  • What is ten divided by two?
  • Using the number two divide up the number ten.

One way to solve this is just to create a second division intent which knows the numbers are reversed, but more likely you can solve this with message shaping.

You will find a similar issue though when you start to use the other system entities. For example if you have @sys-percentage active, then “20%” is not only a percent, but it is also a number. This makes it tricker when trying to read the entity or @sys-number stack.

For what comes back from conversation, you will see that the entity structure has changed.

"entities":[{
    "entity":"sys-number",
    "location":[17,32],
    "value":"200",
    "metadata":{
        "numeric_value":200
    }
}

Now you have a metadata field which you can get the numeric value from.

As always, here is a sample conversation script.

The road to good intentions.

So let’s talk about intents. The documentation is not bad in explaining what an intent is, but doesn’t really go into its strengths, or the best means to collect them.

First the important thing to understand with intents. How Watson perceives the world is defined by their intents. If you ask Watson a question, it can only understand it in relation to the intents. It cannot answer a question where it has not been trained on the context.

So for example if I say “I want to get a fishing license” may work for what you trained, but “I want to get driving license” may give you the same response, simply because it closely matches and falls outside of what your application is intended for.

So it is just as important to understand what is out of scope, but you may need to give an answer to.

Getting your questions for training.

The strength of intents is the ability to map your customers language to your domain language.  I can’t stress this enough. While Watson can be quite intelligent in understanding terms with its training, it is making those connections of language which does not directly related to your domain is important.

This is where you can get the best results. So it is important to collect questions in the voice of your end-user.

The “voice” can also mean where and how the question was asked. How someone asks the question on the phone can be different to instant messaging. Depending on how you plan to create your application, depends on how you should capture those questions.

When collecting, make sure you do not accidentally bias the results. For example, if you have a subject matter expert collecting, you will find they will unconsciously change the question when writing it. Likewise if you question collect from surveys, try to avoid asking questions which will bias the results. Take these two examples.

  • “Ask questions relating to school timetables”
  • “You just arrived on campus, and you don’t know where or what to do next.”

The first one will generate a very narrow scope of test questions related to your application, and not what a person ask when in a situation. The second question is broader, but you may still find that people will say things like “campus”, “where”, “what”.

Which comes first? Questions or Intents?

 

If you have defined the intents first, you need to get the questions for them. However there is a danger that you are creating more work for yourself than needed.

If you do straight question collection, when you start to cluster into intents you will start to see something like this:

Longtail.png

Everything right of the orange line (long tail) does not have enough to train Conversation. Now you could go out and try and find questions for the long tail, but that is the wrong way to approach this.

Focus on the left side (fat head),  this is the most common stuff people will ask. It will also allow you to work on a very well polished user experience which most users will hit.

The long tail still needs to be addressed, and if you have a full flat line then you need to look at a different solution. For example Retrieve & Rank. There is an example that uses both.

Manufacturing Intent

Now creating manufactured questions is always a bad thing. There may be instances where you need to do this. But it has to be done carefully. Watson is pretty intelligent when it comes to understanding the cluster of questions. But the user who creates those questions may not speak in the way of the customer (even if they believe they do).

Take these examples:

  • What is the status of my PMR?
  • Can you give me an update on my PMR?
  • What is happening with my PMR?
  • What is the latest update of my PMR?
  • I want to know the status of my PMR.

Straight away you can see “PMR” which is a common term for an SME, but may not be for the end-user. No where does it mention what a PMR is.  You can also see “update” and “status” repeated, which is unlikely to be an issue for Watson but doesn’t really create much variance.

Test, Test, Test!

Just like a human that you teach, you need to test to make sure they understood the material.

Get real world data!

After you have clustered all your questions, take out a random 10%-20% (depending on how many you have). You set these aside and don’t look at the contents. This is normally called a “Blind Test”.

Run it against what you have trained on and get the results. These should give you an indicator of how it reacts in the real world*. Even if the results are bad, do not look as to why.

Instead you can create one or more of the following tests to see where things are going weird.

Test Set : Similar to the blind test, you remove 10%-20% and use that to test (don’t add back until you get more questions). You should get pretty close results to your blind test. You can examine the results to see why it’s not performing. The problem with the test set is that you are reducing the size of training set, so if you a low number of questions to begin with, then next two tests help.

K-fold cross validation : You split your training set into random segments (K). Use one set to test and the rest to train. You then work your way through all of them. This method will test everything, but will be extremely time-consuming. Also you need to pick a good size for K so that you can test correctly.

Monte Carlo cross validation : In this instance you take out a random 10%-20% (depending on train set size) and test against it. Normally run this test at least 3 times and take the average. Quicker to test. I have a sample python script which can help you here.

* If your questions were manufactured, then you are going to have a problem testing how well the system is going to perform in real life!

I got the results. Now what?

First check your results of your blind test vs whatever test you did above. They should fall within 5% of each other. If not then your system is not correctly trained.

If this is the case, you need to look at the wrong questions cluster, and also the clusters that got the wrong answer. You need to factor in the confidence of the system as well. You should look for patterns that explain why it picked the wrong answer.

More on that later.

 

Building a Conversation interface in minutes.

I come from a Java development background, but since joining Watson I’ve started using Python and love it. 🙂 It’s like it was made for Conversation.

The Conversation test sidebar is handy, but sometimes you need to see the raw data, or certain parts that don’t show up in the side bar.

Creating a Bluemix application can be heavy if you just want to do some testing of your conversation. Python allows you to test with very little code. Here is some easy steps to get you started. I am making an assumption you have

1: If you are using a MAC you have python already installed. Otherwise you need to download from python.org.

2: Install the Watson Developer Cloud SDK. You can also just use Requests, but the SDK will make your life easier.

3: In your conversation service, copy the service credentials as-is (if you are using the latest UI). If it doesn’t look like below, you may need to alter it.

conv0310-2

4: Go to your conversation workspace, and check the details to get your workspace ID. Make a note of that.

5: Download the following code.

conv0310-1

The “ctx” part just paste in your service credentials, and update the workspace ID. The version number you can get from the Conversation API documentation.

6: Run the Python code. Assuming you put in the correct details, you can type into the console and get your responses back from conversation. Just type “…” to quit.

conv0310-3