The road to good intentions.

So let’s talk about intents. The documentation is not bad in explaining what an intent is, but doesn’t really go into its strengths, or the best means to collect them.

First the important thing to understand with intents. How Watson perceives the world is defined by their intents. If you ask Watson a question, it can only understand it in relation to the intents. It cannot answer a question where it has not been trained on the context.

So for example if I say “I want to get a fishing license” may work for what you trained, but “I want to get driving license” may give you the same response, simply because it closely matches and falls outside of what your application is intended for.

So it is just as important to understand what is out of scope, but you may need to give an answer to.

Getting your questions for training.

The strength of intents is the ability to map your customers language to your domain language.  I can’t stress this enough. While Watson can be quite intelligent in understanding terms with its training, it is making those connections of language which does not directly related to your domain is important.

This is where you can get the best results. So it is important to collect questions in the voice of your end-user.

The “voice” can also mean where and how the question was asked. How someone asks the question on the phone can be different to instant messaging. Depending on how you plan to create your application, depends on how you should capture those questions.

When collecting, make sure you do not accidentally bias the results. For example, if you have a subject matter expert collecting, you will find they will unconsciously change the question when writing it. Likewise if you question collect from surveys, try to avoid asking questions which will bias the results. Take these two examples.

  • “Ask questions relating to school timetables”
  • “You just arrived on campus, and you don’t know where or what to do next.”

The first one will generate a very narrow scope of test questions related to your application, and not what a person ask when in a situation. The second question is broader, but you may still find that people will say things like “campus”, “where”, “what”.

Which comes first? Questions or Intents?


If you have defined the intents first, you need to get the questions for them. However there is a danger that you are creating more work for yourself than needed.

If you do straight question collection, when you start to cluster into intents you will start to see something like this:


Everything right of the orange line (long tail) does not have enough to train Conversation. Now you could go out and try and find questions for the long tail, but that is the wrong way to approach this.

Focus on the left side (fat head),  this is the most common stuff people will ask. It will also allow you to work on a very well polished user experience which most users will hit.

The long tail still needs to be addressed, and if you have a full flat line then you need to look at a different solution. For example Retrieve & Rank. There is an example that uses both.

Manufacturing Intent

Now creating manufactured questions is always a bad thing. There may be instances where you need to do this. But it has to be done carefully. Watson is pretty intelligent when it comes to understanding the cluster of questions. But the user who creates those questions may not speak in the way of the customer (even if they believe they do).

Take these examples:

  • What is the status of my PMR?
  • Can you give me an update on my PMR?
  • What is happening with my PMR?
  • What is the latest update of my PMR?
  • I want to know the status of my PMR.

Straight away you can see “PMR” which is a common term for an SME, but may not be for the end-user. No where does it mention what a PMR is.  You can also see “update” and “status” repeated, which is unlikely to be an issue for Watson but doesn’t really create much variance.

Test, Test, Test!

Just like a human that you teach, you need to test to make sure they understood the material.

Get real world data!

After you have clustered all your questions, take out a random 10%-20% (depending on how many you have). You set these aside and don’t look at the contents. This is normally called a “Blind Test”.

Run it against what you have trained on and get the results. These should give you an indicator of how it reacts in the real world*. Even if the results are bad, do not look as to why.

Instead you can create one or more of the following tests to see where things are going weird.

Test Set : Similar to the blind test, you remove 10%-20% and use that to test (don’t add back until you get more questions). You should get pretty close results to your blind test. You can examine the results to see why it’s not performing. The problem with the test set is that you are reducing the size of training set, so if you a low number of questions to begin with, then next two tests help.

K-fold cross validation : You split your training set into random segments (K). Use one set to test and the rest to train. You then work your way through all of them. This method will test everything, but will be extremely time-consuming. Also you need to pick a good size for K so that you can test correctly.

Monte Carlo cross validation : In this instance you take out a random 10%-20% (depending on train set size) and test against it. Normally run this test at least 3 times and take the average. Quicker to test. I have a sample python script which can help you here.

* If your questions were manufactured, then you are going to have a problem testing how well the system is going to perform in real life!

I got the results. Now what?

First check your results of your blind test vs whatever test you did above. They should fall within 5% of each other. If not then your system is not correctly trained.

If this is the case, you need to look at the wrong questions cluster, and also the clusters that got the wrong answer. You need to factor in the confidence of the system as well. You should look for patterns that explain why it picked the wrong answer.

More on that later.


Building a Conversation interface in minutes.

I come from a Java development background, but since joining Watson I’ve started using Python and love it. 🙂 It’s like it was made for Conversation.

The Conversation test sidebar is handy, but sometimes you need to see the raw data, or certain parts that don’t show up in the side bar.

Creating a Bluemix application can be heavy if you just want to do some testing of your conversation. Python allows you to test with very little code. Here is some easy steps to get you started. I am making an assumption you have

1: If you are using a MAC you have python already installed. Otherwise you need to download from

2: Install the Watson Developer Cloud SDK. You can also just use Requests, but the SDK will make your life easier.

3: In your conversation service, copy the service credentials as-is (if you are using the latest UI). If it doesn’t look like below, you may need to alter it.


4: Go to your conversation workspace, and check the details to get your workspace ID. Make a note of that.

5: Download the following code.


The “ctx” part just paste in your service credentials, and update the workspace ID. The version number you can get from the Conversation API documentation.

6: Run the Python code. Assuming you put in the correct details, you can type into the console and get your responses back from conversation. Just type “…” to quit.



Dealing with off topics.

When building a conversational system, how you handle off topics tends to swing one way or the other for people new to it.

They either totally forget about handling chit-chat/off topic. This leads to the system feeling stupid to the end user, even if it has a high accuracy. Or feeling like a search engine.

Or they go the other way and try to answer everything they can possibly think of, without realising the rabbit hole they are going down. Also they may only put in one response to the same type of question, making it feel like a computer program.

The correct way to handle off topics is: Answer the topic not the question, and drive the user back to the intended usage of the conversational system! 

To give an example:

User: Hey Watson, do you love Siri?
Watson: I have never really thought about other AI systems. I am too busy selling goods for pets.

User: Please tell me if you love Siri.
Watson: I would much prefer to help you in buying things for your pet, rather then talking about Artificial intelligences.

User: But do you love Siri?
Watson: Can we please get back to buying things for your pet.

The first response detects that the person is talking about an “AI” related item. But instead of trying to expend time on the nature of the question, you can deflect it with a broad response.

The second response is the same but a bit cooler, and pushing the user back to the topic.The last response stops playing with the user.

You can achieve this quite easily with the following advanced output.

  "output": {
    "text": {
      "values": [
       "I have never really thought about other AI systems. I am too busy selling goods for pets.",
       "I would much prefer to help you in buying things for your pet, rather then talking about Artificial intelligences.",
       "Can we please get back to buying things for your pet."
      "selection_policy": "sequential"

Someone playing with the system however will have it return to the first response on the 4th try. So you can set a counter that say after X number of off topic responses, then have it either redirect to a person, or stop responding completely.

The detection part is tricker. One way to approach this is to have a single intent that detects off topic, and then use a second workspace to drill down to the related off topic. This prevents polluting your main intents, and only requires further calls when needed. I may go into more detail on this later.

For more information on how to handle off topic/chit chat and much more, I strongly recommend doing the Designing Conversational Solutions (Watson Conversation Service) training.

Using counters in Conversation.

In Watson Dialog the use of counters was pretty easy. In Conversation it is as well, but not as intuitive yet. Here I am going to explore three methods of building and using a counter.

First to create a counter, you need to go to advanced mode in your node. You can then type in something like the following.

  "output": {
    "text": "Keep saying T1"
  "context": {
    "countdown": 3

A common mistake that people make is to forget to add the comma after the output block. Conversation will normally make the editor window red, but doesn’t give any hints if you do this. Worse still, if you don’t notice the red block then Conversation will revert that node if you attempt to test or edit another node.

After this we create a node with the following output.

Counter is at: <? context.countdown-- ?>

This tells conversation to display the counter we made, then decrement it. Within the nodes condition we can check to ensure the counter is greater than 0, if not then move to the next node to tell them you are finished. For example:


Hiding the counter

Now you may not want the end user to be able to see the counter. In which case we need to mask the executing code from being rendered. To do this we create a node with the code to execute as follows:

<? context.countdown-- ?>

You then set this node to continue from another node, linking to the output (very important!). In the second node, you put the following into the advanced output window.

  "output": {
    "text": {
      "append": false,
      "values": [
        "Still counting down."

What this does is generate output “Still counting down.”, but because “append” is false, it will erase the previous output. So the user only sees this message.


Do you really need a counter?

The simple solution can be to not use a counter at all. The following code for example will simulate the previous example by just cycling through custom responses.

  "output": {
    "text": {
      "values": [
        "Keep saying T3.",
        "Counting down 3",
        "Counting down 2",
        "Counting down 1",
        "Counting finished!"
      "selection_policy": "sequential"

Here is the sample conversation script you can play with.

Important reminder!

If you use the counter in your conversation flow, you should be responsible for it! It is good coding practice to declare it in the conversation.

Apart from being able to test, it allows you to see where the application layer would likely set the counter. Plus if something failed at the application layer you can trap easier.

Depending on the complexity of your conversational flow, you may sometimes want the application layer to handle modifying the counter. This is fine as well, but still initialize in conversation, even if the application overwrites that initialization.


Treat me like a human.

So one of the main pitfalls in creating a conversational system is assuming that you have to answer everything, no matter how bad it is. In a real life conversation we only go so far.

If you attempt to answer every question the user stops talking and treats the conversation system more like a search engine. So it’s good to force the user to at least give enough context.

One Word

Watson intents generally don’t work great with a single word. To that end create a node with the following condition.


This will capture the one word responses. You can then say something like: “Can you explain in more detail what it is you want?“.

Of course you can have single word domain terms. They don’t give you enough context to answer the user, but enough to help the user. For example, lets say you are making a chat bot for a Vet office. So you may set up a node like this:


You can then list 2-3 questions the user can click and respond to. For example:

1. Why does my fish call me Bob?
2. How do fish sleep?

If you do need to create a list, try to keep it to four items or under.

Two to Three Words

Two to three words can be enough for Watson to understand. But it’s possible that the person is not asking a question. They could be trying to use it like a search engine, or a statement. You may also want to capture this.

To that end you can use the following condition.

input.text.matches(‘^\S+ \S+$|^\S+ \S+ \S+$’)
AND input.text.matches(‘^((?!\?).)*$’)

This will only capture 2-3 words that do not contain a question mark.

Here is the sample conversation script.

Handling Process Flows

While stepping a user through a process flow, don’t assume that the user will ask random questions. Even if they do, you don’t have to answer them. In real life, we wouldn’t try to answer everything if we are in the middle of something.

We may answer something in context, but we are more likely to get impatient, or ask to stop doing the flow and go back to Q&A.

So when creating your flow, try to keep this in mind. If the user asks something outside of what you expect, ask again only make the answers linkable (as long as there is a limit of answers). For example:

Watson: Do you want to a cat or a dog?
User: How do fishes sleep?
Watson: I don’t understand what you mean. Did you want a cat or a dog?

If the user persists, you can create a break out function. As you do a first pass through your user testing, you can see where you need to expand out. Don’t start off coding as if you expect them to break out everywhere.


What is your name?

Asking a persons name in Conversation allows you to personalise responses later on. However actually getting a name from a person which looks good in personalisation is quite hard.

First thing to consider is how you ask the question. Take these examples.

  1. What is your name?
  2. Can I get your name?
  3. How do you like to be known as?

All three of these can illicit different responses from the end user, some not the name at all. Possible responses can be.

  • (1) Simon O’Doherty
  • (2) Why do you want to know my name?
  • (2) No
  • (3) Hugh Jass

None of these answers are ideal. The first one will look silly in personalisation. The next two need to be addressed. The last one is just silly.

So at this point is very important to shape what you send to the user. Actually everything you write, especially in process flow you need to be mindful of how you shape your message.

For this example we are going to use “Hello. Before we begin, can I get your name?“. This is liable to get responses as shown above about “why” and “no”. But you will also find that people will often not even read what is being said. So you could also expect a question.

To deal with this, we can create a simple entity to look for common key terms.


The reason to use an entity over intents is that in a process flow, the scope of questions is very narrow, and we don’t want to interfere with any other intent training that is required.

We can take the following actions on each entity.

Output: “I need your name so I can be more personalised. Can I get your name?” and ask again.

Output: “Before you ask me a question, can I get your name?” and ask again.

Don’t try to be everything to everyone!

I have seen people try to have Conversation answer any question at any time. This can cause serious issues in process flows, and make it seem more like a search engine.

When creating a process flow, don’t assume that the end user will do everything. Test with your users first to see where you need to customise behaviour.

We don’t want to force the person to give their name. So we can customise this with the following advanced code:

    "output": {
        "text": "No problem, you don't need to tell me your name. Please ask me a question"
    "context": {
        "name": ""

We are setting the context variable name to blank, so that personalisation doesn’t have a name.

Condition: input.text != “”
This node is to actually load the persons name using the following advanced code:

  "output": {
    "text": "Hello <? input.text ?>. Please ask me a question."
  "context": {
    "name": " <? input.text ?>"

Condition: true
Output: “I didn’t quite get that.” and ask again.
We use this final node in case someone enters nothing. We just send back to ask again.

Here is the sample script for above.

Avoid too much personalisation!

When personalising responses, you should avoid doing it too often. Having the persons name appear infrequently has much more of an impact.

Try to create multiple possible responses and have 1 out of every 4-5 personalised and random responses.

Outstanding Issues

So while this will work, there are still a number of issues you need to factor in.

Too much information: What happens if they answer “My name is Simon O’Doherty”. Conversation cannot handle this, and will use the whole text .

To work around this, you can try passing the users input to AlchemyAPI and Entity extraction to get the persons name. Running on the text above gives the following output (truncated):

"entities": [
            "type": "Person",
            "relevance": "0.33",
            "count": "2",
            "text": "Simon O'Doherty"

There are still issues with this though, which you will find when you experiment with it. Another option is a simple UIMA dictionary which detects names. I did an example of this long ago, although using WEX or Watson Knowledge Studio may make it a lot easier.


Bad information: Apart from the joke or offensive names, allowing the user to personalise responses means your system is open to abuse. So you will need to do some sanitisation at the application layer to stop this.

Response formatting: In the example script you will see that context name is being set by ” <? ?>”. However when you test, you will see the preceding space is removed. Without this, you will get some odd formatting like “Hello .” if no name is specified. You need to correct this at the application layer.

Tools of the trade – UltraEdit

So I use a number of applications to help in building and testing Watson. But the main ones I use are as follows.

  • Ultraedit
  • SPSS Modeller
  • Excel
  • Languages: Java and Python
  • iPad (no seriously :))

I want to touch on UltraEdit for this one. I primarily use it for the following.

  • Making mass fixes to data that SPSS can’t easily handle, and languages take too long.
  • Making the JSON readable
  • Extracting meaningful data from JSON to work with.
  • Putting that data back easily.

It is a pretty powerful text editor which allows you to do very complex search/replace/modifications to data. I am of the opinion if you prefer a different text editor, then go with what you know. But here are the main scripts I use for UltraEdit.

When you open the conversation JSON file it will look a little bit like this.


Just a mess. There is two scripts I picked up from UltraEdit that allow you to decompress it and compress it back to what you see above.

JSON – readable

if (!UltraEdit.columnMode) {
if (!UltraEdit.columnMode) {if (!UltraEdit.activeDocument.isSel()) {
var jsonText = UltraEdit.activeDocument.selection;
var json = JSON.parse(jsonText);
var jsonTextFormatted = JSON.stringify(json, null, 2);

Will turn it into this.


To turn it back again.

JSON – compress

if (!UltraEdit.columnMode) {
if (!UltraEdit.activeDocument.isSel()) {
var jsonText = UltraEdit.activeDocument.selection;
var json = JSON.parse(jsonText);
var jsonTextFormatted = JSON.stringify(json, null, null);

Now there are purists out there that can live with reading JSON. I’m sure all they can see is blond, brunette, redheads. For me I prefer to have my data with a lot less noise.

To that end I created two macros for extracting the entities and intents from the conversation file, into a format you can open in excel/SPSS.

Conversation – get intents

if (!UltraEdit.columnMode) {
if (!UltraEdit.activeDocument.isSel()) {
var jsonText = UltraEdit.activeDocument.selection;
var json = JSON.parse(jsonText);UltraEdit.newFile();
var intents = json.intents;UltraEdit.activeDocument.write(‘”Question”,”Intent”\n’);
for (var i = 0; i < intents.length; i++) {      var intent = intents[i].intent
var examples = intents[i].examples      for (var j = 0; j < examples.length; j++) {
UltraEdit.activeDocument.write(‘”‘ + examples[j].text + ‘”,”‘ + intent + ‘”\n’);

Creates a CSV file you can work with.


Conversation – get entities

if (!UltraEdit.columnMode) {
if (!UltraEdit.activeDocument.isSel()) {
}var jsonText = UltraEdit.activeDocument.selection;
var json = JSON.parse(jsonText);UltraEdit.newFile();var entities = json.entities;
for (var i = 0; i < entities.length; i++) {var entity = entities[i].entity
var values = entities[i].valuesfor (var j = 0; j < values.length; j++) {
UltraEdit.activeDocument.write(entity + ‘\t’ + values[j].value + ‘\t’);var synonyms = values[j].synonyms
var syn = ‘ ‘
for (var k = 0; k < synonyms.length; k++) {
syn += ‘”‘ + synonyms[k] + ‘”,’
UltraEdit.activeDocument.write(syn + ‘\n’)

This one creates a Tabbed separated file (TSV) with the entities in a nice readable format.


I also have scripts to put the intents and entities back into the conversation script, but I don’t plan to release at this time. As it is so easy to make a mess, and I won’t be supporting these.

You can pick up the scripts here.

Handling low confidence answers in Conversation.

So I will be switching between, newbie to expert in no particular order. This post assumes you already know how to use conversation.

In earlier versions of Watson, it was designed to handle high, medium and low confidence answers in different ways. With conversation this all works a little bit differently.

For this example, I am using the NLC demo intents. The first thing you have to do is find your low confidence.

Unlike earlier versions of WEA, the confidence is relative to the number of intents you have. So the quickest way to find the lowest confidence is to send a really ambiguous word.

These are the results I get for determining temperature or conditions.

treehouse = conditions / 0.5940327076534431
goldfish = conditions / 0.5940327076534431
music = conditions / 0.5940327076534431

See a pattern? 🙂 So the low confidence level I will set at 0.6. Next is to determine the higher confidence range. You can do this by mixing intents within the same question text. It may take a few goes to get a reasonable result.

These are results from trying this (C = Conditions, T = Temperature).

hot rain = T/0.7710267712183176, C/0.22897322878168241
windy desert = C/0.8597747113239446, T/0.14022528867605547
ice wind = C/0.5940327076534431, T/0.405967292346557

I purposely left out high confidence ones. In this I am going to go with 0.8 as the high confidence level.

Once you have those, you can create your conversation. Your first node is to check that later nodes won’t fail.


The next nodes check to see if a low confidence or medium confidence is hit. For low, it won’t respond. For medium, it will display text, then continue from the next condition to find the actual answer.


When you test it, you get the following results.


You may still need to test with users to tweak the upper value.

Using this fall through method makes it easier to maintain with your intents. You can use a nested option if only certain intents need to be hit.

Here is the Sample file.

Gotchas: When writing numbers in conditions, always start it with a numerical value like “0.8”, not “.8”. Otherwise you will get a Dialog node error: EL1049E:(pos 24): Unexpected data after ‘.’: ‘8’