Six months later…

I had planned to create an update frequently, but life and more importantly work got in the way. With the new role, a lot of focus is on the other aspects of AI related technologies.

Of course while things are different for me, the chat bot world continues on. Watson Conversation becomes Watson Assistant. With a huge number of updates and changes.

My blog continues to be a source for numerous people starting with Watson Assistant. But Watson development have made changes that makes most redundant.

So until my next update (soon, I promise) let me give a brief update to every blog entry and give Watson Development the credit they deserve.

Note: I mention Beta a lot. In your workspace UI, you now are able to request access to the Watson Assistant Beta, and try out all the new features that are coming. I also didn’t reference every blog entry. If it’s not in the list it’s probably still the same, or not important enough to mention.

Testing your Intents

K-Fold and blind testing is still very much a part of ensuring you have trained the system well. But for those who are currently playing with the Beta will know, there are features coming that reduce this need or even make it redundant (jury is still out on this. I’m favoring the latter).

There are number of Watson Assistant K-Fold and testing apps up on github, if you don’t want to try decipher my example.

Pushing my Buttons

While I can see a need for buttons in some cases, I still believe it destroys the conversational aspect of a chat bot. Overuse turns your chat bot into an app, and damages training. I am also against having in a conversational text response.

Chihuahua or Muffin, revisited.

Huge updates have been made to Watson Visual Recognition (VisRec). The main being it is now integrated into Watson Studio. If you haven’t tried Watson Studio yet, go now! 🙂 It is a Data Science / Machine Learning / Deep Learning development platform.

Watson VisRec still continues to amaze people at how quickly and accurately it can classify custom content, we also have Watson Media which can annotate live video. If RAW POWER is needed, there is PowerAI Vision, which allows for real time classification on video. We are talking “Person of Interest” level classification. 🙂

I have no confidence in Entities.

I have nothing but love for Entities now! Gone are the “keyword” type entities. Now they are built using a ML NLP model. Not only that, it also helps to dramatically improve the training of Intents.

There are now Pattern Entities as well. These allow for complex regular expressions.

The design pattern of using entities to lower the confidence of an intent is still valid.

Manufacturing Intent

Creating manufactured questions is still very much an issue you would want to avoid.

That said, you may have seen Project Debater. While this is personal opinion (and no guarantee it will ever be a feature), I can see this technology augmenting the intent and answer creation of Watson Assistant.

Anaphora? I hardly knew her.

This is still very much a good and well used pattern. I’d love if Watson Development wrapped it into Watson Assistant, but for now it helps easily adding intelligence into your conversation.

Removing the confusion in Intents.

While this can help still in training, the current beta has features which will likely make this redundant.

Watson Conversation just got turned up to 11.

This should be “Watson Assistant” 😉

Slots have continued to improve. You can now create more complex slot responses and handlers. Allowing to jump to nodes within the slot itself, rather then create a conditional tree.

Digressions is also a new feature which allows you to dictate when the user can go off script, and pull them back.

I love Pandas

While I still love using Pandas, Watson Assistant logging analytics has improved dramatically. Again there are features coming in the Beta which make this even better for training your enterprise level chat bots.

For those in Premium IBM Cloud there is also a feature for it to recommend topics you have never seen before, and to help collate questions for intent training of new topics.

I have a Dream …

Since writing that blog post, Watson Speech to Text now uses Deep Learning to understand human speech. There is also a feature of “Acoustic models”, which allows you to train on accents in relation to your domain language.

Compound Questions

This is still a good pattern for compound questions. The current beta features may make this redundant in certain designs.

Improving your Intents with Entities.

With the huge improvements to Entities, I would consider this an anti-pattern. So ignore.

Conversing in three dimensions.

This is still a valid pattern. In fact I’ve seen it used in a number of very successful implementations. Again though, playing with the beta will show this may become redundant from a coding perspective.

Data Science Experience

It’s now “Watson Studio”! There are so many new features in this. The most important being the support of Deep Learning.

Since I wrote that blog you have the following new features (may be more).

  • SPSS modeler. Similar to the desktop version, but allows you to use the raw power of IBM cloud. You can still import/export your streams between the desktop version.
  • Data Refinery. There is a quote that 60% of all data science work is cleaning up data. This helps in doing this.
  • Watson Machine Learning. You give it your data, tell it what you want to use to determine the outcome. It will then run through multiple machine learning models, to find the best one that will work for your data. Something that requires a ML expert to do for even mundane stuff. It even creates the API and a test UI for you.
  • Experiments. Allows you to run numerous models and tweak to find the best one for your data.
  • Neural Network Modeler. Allows you to build your tensorflow / pytorch / keras / caffe models in a simple GUI. It will then write your code which you can export to your applications. Here is a good article on it.

Much more than this as well. Try it out, it’s free. 🙂

Building a Conversation interface in minutes.

Watson Assistant now has the ability to create a number of conversation interfaces through the workbench (little to no coding in some instances). For example, Facebook and Slack.

Understanding how a conversation flows

This is redundant. Actually I can argue that most of what I wrote in 2016 is redundant now.

Conversation now has Folders which allows you to check a branch of nodes, but then continue the flow of the tree, instead of falling back to root.

Nodes now have a “Skip Input” which means you don’t have to put a Jump to get into the branch (which is prone to breaking if you have to add more nodes)

Digressions allow you to jump around looking for the answer and return to where you left off.

… So there you have it. Hopefully everyone is up to date. 🙂 See ya soon.

Chihuahua or Muffin, revisited.

I just finished reading Maria Yao‘s article Chihuahua OR Muffin? Searching For The Best Computer Vision API. It’s a fun read, but I felt it didn’t really show off the power of Watson Visual Recognition.

For the demo in the article, the general classifier was being used.

One of the main advantages of Watson Visual recognition is that you can create your own custom classifiers. It is very simple too.

First, you need data.

Using Marias article I pulled the Chihuahua and Muffin pictures from ImageNet.

Like most data, it tends to need a bit of cleaning. So I deleted any images below 14KB in size. The reason for this was the majority at that size were just corrupted. I also went through and deleted any images which were adverts or “this image is no longer there” banners.

Overall that was 500 images deleted totally. It still left 3,000 images to play with.

Next I created a Visual Recognition service. For this I created the free version. This limited me to 250 events a day. So I had to lower my training sets to 100 pictures from each set.

I took a random 100 from each. I didn’t examine the photos at all, but here are a few to give you an idea of how the images look.

example_training

As you can see, no thought put into worrying about other items in the picture.

I zipped the images up, and created a classifier like so.

classifier

Then I clicked create, and waited a little over 10 minutes for it to analyse the pictures.

Once the classifier was finished training, I was ready to test. As some may not be aware, Visual Recognition also offers a food classifier. So for my first two tests I tried my classifier, General and Food.

test1

So you can see the red bar on the classifier I made. This is more because I only gave 100 examples. As you give more training examples, it’s confidence increases. But you can see that the difference between Muffin and Chihuahua is clear.

You can also see the food classifier got it as well.

What about the Chihuahua?

test2

As you can see, all three do quite well on classification. But what about the original pictures which look similar? I ran those through and ended up with this.

results2

As you can see it got them all right! None of these were used to train against.

As demos go though this is simple and fun. But with good classified images, it can be scary accurate for proper real world use cases.

Having said that, I did have one failure. Testing the samples on Maria’s page, it was able to understand the cookie monster muffin and the man holding the chihuahua.

But the muffin in the plastic bag with the Chihuahua it could not get. I tried cropping out the dog, but it still failed but with a lower confidence. I suspect this is a combination of training and a bad quality photo.

Screen Shot 2017-09-28 at 17.40.17

I have no confidence in Entities.

I have something I need to confess. I have a personal hatred of Entities. At least in their current form.

There is a difference between deterministic and probabilisitic programming, that a lot of developers new to Watson find it hard to switch to. Entities bring them back to that warm place of normal development.

For example, you are tasked with creating a learning system for selling CatsDogs, and Fishes. Collecting questions you get this:

  • I want to get a kitten
  • I want to buy a cat
  • Can I get a calico cat?
  • I want to get a siamese cat
  • Please may I have a kitty?
  • My wife loves kittens. I want to get her one as a present.
  • I want to buy a dog
  • Can I get a puppy?
  • I would like to purchase a puppy
  • Please may I have a dog?
  • Sell me a puppy
  • I would love to get a hound for my wife.
  • I want to buy a fish
  • Can I get a fish?
  • I want to purchase some fishes
  • I love fishies
  • I want a goldfish

The first instinct is to create a single intent of #PURCHASE_ANIMAL and the create entities for the cats, dogs and fishes. Because it’s easier to wrap your head around entities, then it is to wonder how Watson will respond.

So you end up with something like this:

20170910entities

Wow! So easy! Let’s set up our dialog. To make it easier, lets use a slot.

20170910entitiesA

In under a minute, I have created a system that can help someone pick an animal to buy. You even test it and it works perfectly.

IT’S A TRAP!

First the biggest red flag with this is you have now turned your conversation into a deterministic system.

Still doing cross validation to test your intents? Give up, it’s pointless.

You can break it by just typing something like “I want to buy a bulldog”. You are stuck into an endless loop.

The easiest solution is to tell the person what to type, or link/button it. But it doesn’t exhibit intelligence (and I hate buttons more than I hate entities 🙂 ).

The other option is to add “bulldog” to the @Animals:Dog entity. But when you go down that rabbit hole, you could realistically add the following.

  • 500+ types of breeds.
  • Common misspellings of those words.
  • plurals of each breed.
  • slang, variations and nicknames of those animals.

You are easily into the thousands of keywords to match, and all it takes is one person to make a typo you don’t have in the list and it still won’t work.

Using entities in a probabilistic way.

All is not lost! You can still use entities, and keep your system intelligent. First we break up the intents into the types of animals like so:

20170910entitiesB

So now if I type “I want to buy a bulldog” I get #PurchaseDog with 68% confidence. Which is great, as I didn’t even train it on that word.

So next I try “I want to buy a pet” and I get #PurchaseCat with 55% confidence.

20170910entitiesCat

Hmm, great for cat lovers but we want conversation to be not so sure about this.

So we create the entities as before for Cat, Dog, Fish. You can use the same values.

Next before you check intents, add a node with the following condition.

20170910entitiesC

This basically ensures that irrelevant hasn’t been hit, and then checks if the animal entities have not been mentioned.

Then in your JSON response you add the following code.

{
    "context": {
    "adjust_confidence": "<? intents[0].confidence = intents[0].confidence - 0.36 ?>"
    },
    "output": {
        "text": {
            "values": [],
            "selection_policy": "sequential"
        }
    }
}

The important part is the “adjust_confidence” context variable. This will lower the first intents confidence by 0.36 (36%).

We set the node to jump to the next node in line, so it can check the intents.

Now we get “I don’t understand” for the pet question. Bulldog still works as it doesn’t fall below the 20%.

Demo Details.

I used 36% for the demo, but this will vary in other projects. Also if your confidence level is too high, you can pick a smaller value, and then have another check for a lower bound. In other words, set your conversation to ignore any intent with a confidence lower then 30%, and then set your adjustment confidence to -10%.

Advantages

Using this approach, you don’t need to worry as much with training your entities, only your intents. This allows you to build a probabilistic model which isn’t impacted unless it is unsure to begin with.

I have supplied a Sample Conversation which demos above.

Manufacturing Intent

Let me start this article with a warning:  Manufacturing questions causes more problems than it solves.

Sure the documentation, and many videos say the reverse. But they tend to give examples that have a narrow scope.

Take the car demo for example. It works because there is a common domain language that everyone who uses a car knows. For someone who has never seen a car before, won’t understand what a “window wiper” is, but they may say something like “I can’t see out the window, because it is raining”.

This is why when building your conversation system, it is important to get questions from people who actually use the system, but don’t know the content. They tend not to know how to ask a question to get the answer.

But there are times when it can’t be avoided. For example, you might be creating a system that has no end users yet. In this case, manufacturing questions can help bootstrap the system.

There are some things to be aware of.

Manual creation.

This is actually very hard, even for the experienced. Here are the things you need to be aware of.

Your education and culture will shape what you write.

You can’t avoid it. Even if you are aware of this, you will fall back into the same patterns as you progress through creating questions. It’s not easy to see until you have a large sample. Sorting can give you a quick glance, while a bag of words makes it more evident.

If you know the content, you will write what you know.

Again, having knowledge of the systems answers will have you writing domain language in the questions. You will use terms that define the system, rather than describing what it does.

If you don’t know the content, use user stories.

If you manage to get someone who could be a representative user, be careful in how you ask them to write questions. If they don’t fully understand what you ask, they will use terms as keywords, rather than their underlying meaning.

Let’s compare two user stories:

  • “Ask questions about using the window wipers.”
  • “It is raining outside while you are driving, and it is getting harder to see. How might you ask the car to help you?”

With the first example, you will find that people will use “window wipers”, “wipers”, “window” frequently. Most of the questions will be about switching it on/off.

With the second example, you may end up seeing questions like this.

  • Switch on the windshield wipers.
  • Activate the windscreen wipers.
  • Is my rain sensor activated?
  • Please clear the windows.

Your final application will shape the questions as well.

If you have your question creation team working on a desktop machine, they are going to create questions which won’t be the same as someone typing on mobile, or talking to a robot.

The internet can be your friend.

Looking for similar questions online in forums can help you in seeing terms that people may use. For example, all these mean the same thing: “NCT”, “MOT”, “Smog Test”, “RWC”, “WoF”, “COF”.

But those are meaningless to people if they are in different countries.

Automated Creation

A lot of what I have seen in automation tends not to fare much better. If it did, we wouldn’t need people to write questions. 🙂

One technique is to try and create a few questions from existing questions. Again I should stress, this is generally a bad idea, and this example doesn’t work well, but might give you something to build on.

Take this example.

  • Can my child purchase a puppy?
  • Are children allowed to buy dogs?

From a manual view we can see the intent is to allow minors to buy. Going over to the code.

For this I am using Spacy, which can check the similarity of each word against each other. For example.

import spacy
nlp = spacy.load('en')

dog = nlp(u'dog')
puppy = nlp(u'puppy')
print( dog.similarity(puppy))

Will output: 0.760806754875

The higher the number, the closer the words to each other. By setting a threshold on the value, you can reduce it to the important words. Setting a threshold of 0.7, we get:

Screen Shot 2017-09-09 at 18.31.18

Playing with larger questions you will find that certain parts of speech (POS) are more noise. So you can drop the following to remove possible noise.

  • DET = determiner
  • PART = particle
  • CCONJ = conjunction
  • ADP = adposition

Now that you have reduced it to the main terms, you can build a synonym off of these, like so:

dog = nlp('dog')
word = nlp.vocab[dog.text]
sym = sorted(word.vocab, 
             key=lambda w: word.similarity(w), 
             reverse=True
)
print('\n'.join([w.orth_ for w in sym[:10]]))

 

Which will print out the following:

  • dog
  • DOG
  • Dog
  • dogs
  • DOGS
  • Dogs
  • puppy
  • Puppy
  • PUPPY
  • pet

As you can see, a lot of repetition. You can remove duplicates. Also be wary to set the upper bound of the sym object when reading.

So after you generate a group of sample synonyms, you end up with something like this.

Screen Shot 2017-09-09 at 18.50.07

Now it’s a simple matter of just generating a random set of questions from this table. You end up with something like:

  • need children purchasing dogs
  • can kids buying puppy
  • may child purchases dog
  • will children purchase dogs
  • will kids buy pets
  • make children cheap puppies
  • need children purchase puppies

As you can see, they are pretty bad. Not because of the word salad, but that you have a very narrow scope of what can be answered. But it can give you enough to have your intent trigger.

You can also mitigate this by creating a tensor of a number of similar questions, n-grams instead of single words, a custom domain dictionary and increasing your dictionary terms.

At the end of the day though, they are still going to be manufactured.

Removing the confusion in intents.

While the complexity of building Conversation has reduced for non-developers, one of the areas that people can sometimes struggle is training intents.

When trying to determine how the system performs, it is important to use tried and true methods of validation. If you go by someone just interacting with the system you can end up with what is called “perceived accuracy”. A person may ask three to four questions, and three may fail. Their perception becomes that the system is broken.

Using cross validation allows you to give a better feel of how the system is working. As will a blind / test set. But knowing how the system performs and trying to trying to interpret the results is where it takes practise.

Take this example test report. Each line is where a question is asked, and it determines what the answer should be, versus what answer came back. The X denotes where an answer failed.

report_0707

Unless you are used to doing this analysis full time, it is very hard to see the bigger picture. For example, is DOG_HEALTH the issue, or is it LITTER_HEALTH + BREEDER_GUARANTEE?

You have to manually analyse each of these clusters and determine what changes are required to be made.

Thankfully Sci-Kit makes your life easier with being able to create a confusion matrix. With this you can see how each intent performs against each other. So you end up with something like this:

confusion_matrix

So now you can quickly see what Intents are getting confused with others. So you focus on those to improve your accuracy better. In the example above, DOG_HEALTH and BREEDER_INFORMATION offer the best areas to investigate.

I’ve created a Sample Notebook which demonstrates the above, so you can modify to test your own conversations training.

 

Watson Conversation just got turned up to 11.

I just wanted to start with why my blog doesn’t get updated that often.

  1. Life is hectic. With continual travel, and a number of other things going on, the chance to be able to sit down for peace and quiet is rare. That may change one way or another soon.
  2. I try to avoid other peoples thunder on a range of issues. For example I recommend keeping an eye on Simon Burns blog in relation to the new features.
  3. My role continues to evolve, at the moment that is one of “Expert Services”, where people like myself go around the world to help our customers run the show themselves. This probably warrants it’s own page, but it is a balance between what is added value to that service, and what everyone should know. (I’m in the latter camp, but I got to eat too).

The biggest reason for lack of updates? Development keep changing things! I’m not the only one that suffers from this. I read a fantastic design patterns document earlier this year by someone in GBS, which went mostly out of date a week or two after I got it.

So it has been the same situation with the latest update to conversation. It is a total game changer. To give you an example of how awesome it is, here is an image of a sample “I want to buy a dog” dialog flow.

pre-slots

Now compare that to the new Slots code. (btw, you may have noticed the UI looks cooler too).

slots

The same functionality took a fraction of the time, and has even more complexity of understanding than the previous version. That single node looks something like this:

slots2

That’s all for the moment, better if you play with it yourself. I have some free time shortly, and I will be posting some outstanding stuff.

 

I have a dream…

Following on from Speech to Text, let’s jump over to Text to Speech. Similar to conversation, what can make or break the system is the tone and personality you build into the system.

Developers tend to think about the coding, and not the user experience so much.

To give an example, let’s take a piece of a very famous speech from MLK. Small sample so it doesn’t take all day:

I still have a dream. It is a dream deeply rooted in the American dream.

I have a dream that one day this nation will rise up and live out the true meaning of its creed: “We hold these truths to be self-evident, that all men are created equal.”

Let’s listen to Watson as it directly translates.

It sounds like how I act when I am reading a script. 🙂

Now lets listen to MLK.

You can feel the emotion behind it. The pauses and emphasis adds more meaning to it. Thankfully Watson supports SSML, which allows you to mimic the speech.

For this example I only used two tags. The first was <parsody> which allows Watson to have the same speaking speed as MLK. The other tag was <break> which allows me to make those dramatic pauses.

Using Audacity I was able to put the generated speech against the MLK speech. Then selecting the pause areas, I can quickly see the pause lengths.

audicity

I finally ended up with this:

Audacity also allows you to overlay audio, to get a feel to how it would sound if there were crowds listening.

The final script ends up like this:

<prosody rate="x-slow">I still have a dream.</prosody>
<break time="1660ms"></break>
<prosody rate="slow">It is a dream deeply rooted in the American dream.</prosody>
<break time="500ms"></break>
<prosody rate="slow">I have a dream</prosody>
<break time="1490ms"></break>
<prosody rate="x-slow">that one day</prosody>
<break time="1480ms"></break>
<prosody rate="slow">this nation <prosody rate="x-slow">will </prosody>ryeyes up</prosody>
<break time="1798ms"></break>
<prosody rate="slow">and live out the true meaning of its creed:</prosody>
<break time="362ms"></break>
<prosody rate="slow">"We hold these truths to be self-evident,</prosody>
<break time="594ms"></break>
<prosody rate="slow">that all men are created equal."</prosody>

I have zipped up all the files for download, just in case you are having issues running the audio.

IHaveADream.zip

In closing, if you plan to build a conversational system that speaks to the end user, you also need skills in talking to people, just not being able to write.

Speech to Text and Conversation

I thought I would take a moment to play with Speech to Text and a utility that was released a few months ago.

The Speech to Text Utils allows you to train S2T using your existing conversational system. To give a quick demo, I got my son to ask about buying a puppy.

I set up some quick Python code to print out results:

import json
from watson_developer_cloud import SpeechToTextV1

# ctx is Service credentials copied from S2T Service. 

s2t = SpeechToTextV1(
 username=ctx.get('username'),
 password=ctx.get('password')
)

def wav(filename, **kwargs):
  with open(filename,'rb') as wav:
    response = s2t.recognize(wav, content_type='audio/wav', **kwargs)

if len(response['results']) > 0: 
  return response['results'][0]['alternatives'][0]['transcript']
else:
  return '???';

So testing the audio with the following code:

wav_file = 'p4u-example1.wav'
print('Broadband: {}'.format(wav(wav_file)))
print('NarrowBand: {}'.format(wav(wav_file,model='en-US_NarrowbandModel')))

Gets these results:

Broadband: can I get a puppy 
NarrowBand: can I get a puppy

Of course the recording is crystal clear, which is why such a good result. So I added some ambient noises from SoundJay to the background. So now it sounds like it is in a subway.

Running the code above again get’s these results.

Broadband: Greg it appropriate 
Narrowband: can I get a phone

Ouch!

Utils to the rescue!

So the purpose of asking about a puppy is that I have a sample conversation system that is about buying a dog. Using that conversation file I did the following.

1: Installed Speech to Text Utils.

2: Before you begin you need to set up the connection to your S2T service (using service credentials).

watson-speech-to-text-utils set-credentials

It will walk you through the username and password.

3: Once that was set up, I then tell it to create a customisation.

watson-speech-to-text-utils corpus-from-workspace puppies4you.json

You need to map to a particular model. For testing, I attached it to en-US_NarrowbandModel and en-US_BroadbandModel.

4: Once it was run, I get the ID numbers for the customisations.

watson-speech-to-text-utils customization-list

Once I have the ID’s I try the audio again:

wav_file='p4u-example2.wav'
print('Broadband: {}'.format(wav(wav_file,customization_id='beeebd80-2420-11e7-8f1c-176db802f8de',timestamps=True)))
print('Narrowband: {}'.format(wav(wav_file,model='en-US_NarrowbandModel',customization_id='a9f80490-241b-11e7-8f1c-176db802f8de')))

This outputs:

Broadband: can I get a puppy 
Narrowband: can I get a phone

So the broadband now works. Narrowband is likely the quality is too poor to work with. There is also more specialised language models for children done by others to cope with this.

One swallow does not make a summer.

So this is one example, of one phrase. Really for testing, you should test the whole model. From a demonstration from development, it was able to increase a S2T model accuracy from around 50% to over 80%.

 

 

Watson V3 Certification

ibm-certified-application-developer-watson-v3-certificationSo I got my Watson V3 Certification a week or so ago, and the badge just arrived yesterday.

I sat the mock exam without studying and passed. So I thought I’d try the real exam, and passed that too.

Overall if you have been working in the Watson group for 3+ years, where your job role is to have medium to expert knowledge of all (non-Health) Watson products, then you are probably going to find the exam OK to pass.

For people who haven’t, it’s not going to be easy. I strongly recommend following the study guide on the test preparation certification page if you plan to get this.

My only quibbles on the exam is that the technology changes a lot.

For example, all the design patterns for coding conversation before December last year are not that relevant any more, and will likely change again soon. (Which is part reason for lack of updates on the blog, the other being laziness 🙂 )

So you need to know the current active technologies even if they are going away. Plus there will probably be a V4 exam in 6 months or so time.

I’d also like to see more focused certifications for some parts of the Watson Developer Cloud. For example, being an expert at Discovery Service, doesn’t make you an expert of Conversation and vise-versa.