Monthly Archives: April 2017

I love Pandas!

Not the bamboo eating kind (but they are cute too), Python Pandas!

But first… Conversation has a new feature!


You can now download your logs from your conversation workspace into a JSON format. So I thought I’d take this moment to introduce Pandas. Some people love the “Improve” UI, but personally I like being able to easily mold the data to what I need.

First, if you are new to Python, I strongly recommend getting a Python Notebook like Jupyter set up or use IBM Data Science Experience. It makes learning so much easier, and you build your applications like actual documentation.

I have a notebook created so you can play along.

Making a connection

As the feature is just out, the SDK’s don’t have the API for it, so I will be using requests library.

basic_auth = HTTPBasicAuth(ctx.get('username'), ctx.get('password'))
response = requests.get(url=url, auth=basic_auth)
j = json.loads(response.text)

So we have the whole log now sitting in j but we want to make a dataframe. Before we do that however, let’s talk about log analysis and the fields you need. There are three areas we want to analyse in logs.

Quantitive – These are fixed metrics, like number of users, response times, common intents, etc.

Qualitative – This is analysing how the end user is speaking, and how the system interpreted and responded. Some examples would be where the answer returned may give the wrong impression to the end user, or users ask things out of expected areas.

Debugging – This is really looking for coding issues with your conversation tree.

So on to the fields that cover these areas. These are all contained in j['response'].

Field Usage Description
input.text Qualitative This is what the user or the application typed in.
intents[] Qualitative This tells you the primary intent for the users question. You should capture the intent and confidence into columns. If the value is [] then means it was irrelevant.
entities[] Quantitive The entities found in relation to the call. With this and intents though, it’s important to understand that the application can override these values.
output.text[] Qualitative This is the response shown to the user (or application).
output.log_messages Debugging Capturing this field is handy to look for coding issues within your conversation tree. SPEL errors show up here if they happen.
output.nodes_visited Debugging
This can be used to see how a progression through a tree happens
context.conversation_id All Use this to group users conversation together. In some solutions however, one pass calls are sometimes done mid conversation. So if you do this, you need to factor that in.
context.system.branch_exited Debugging This tells you if your conversation left a branch and returned to root.
context.system.branch_exited_reason Debugging If branch.exited is true then this will tell the why. completed means that the branch found a matching node, and finished. fallback means that it could not find a matching node, so it jumps back to root to find the match.
context.??? All You may have context variables you want to capture. You can either do these individually, or code to remove conversation objects and grab what remains
request_timestamp Quantitive
When conversation received the users response.
response_timestamp Quantitive
When conversation responded to the user. You can do a delta to see if there are conversation performance issues, but generally keep one of the timestamp fields for analysis.


So we create a row array, and fill it with dict objects of the columns we want to capture. For clarity of the blog post, the sample code below

import pandas as pd
rows = []

# for object in Json Logs array.
for o in j['logs']:
    row = {}
    # Let's shorthand the response object.
    r = o['response']
    row['conversation_id'] = r['context']['conversation_id']
    # We need to check the fields exist before we read them. 
    if 'text' in r['input']: row['Input'] = r['input']['text']
    if 'text' in r['output']:row['Output'] = ' '.join(r['output']['text'])
    # Again we need to check it is not an Irrelevant response. 
    if len(r['intents']) > 0:
        row['Confidence'] = r['intents'][0]['confidence']
        row['Intent'] = r['intents'][0]['intent']


# Build the dataframe. 
df = pd.DataFrame(rows,columns=['conversation_id','Input','Output','Intent','Confidence'])
df = df.fillna('')

# Display the dataframe. 

When this is run, all going well you end up with something like this:


The notebook has a better report, and is also sorted so it is actually readable.


Once you have everything you need in the dataframe, you can manipulate it very fast and easy. For example, let’s say you want to get a count of the intents found.

# Get the counts.
q_df = df.groupby('Intent').count()

# Remove all fields except conversation_id and intents. 
q_df = q_df.drop(['request TS', 'response TS', 'User Input', 'Output', 'Confidence', 'Exit Reason', 'Logging'],axis=1)

# Rename the conversation_id field to "Count".
q_df.columns = ['Count']

# Sort and display. 
q_df = q_df.sort_values(['Count'], ascending=[False])

This creates this:


The Jupyter notebook also allows for visualisation of data as well. Although I haven’t put any in the sample notebook.

I have a dream…

Following on from Speech to Text, let’s jump over to Text to Speech. Similar to conversation, what can make or break the system is the tone and personality you build into the system.

Developers tend to think about the coding, and not the user experience so much.

To give an example, let’s take a piece of a very famous speech from MLK. Small sample so it doesn’t take all day:

I still have a dream. It is a dream deeply rooted in the American dream.

I have a dream that one day this nation will rise up and live out the true meaning of its creed: “We hold these truths to be self-evident, that all men are created equal.”

Let’s listen to Watson as it directly translates.

It sounds like how I act when I am reading a script. 🙂

Now lets listen to MLK.

You can feel the emotion behind it. The pauses and emphasis adds more meaning to it. Thankfully Watson supports SSML, which allows you to mimic the speech.

For this example I only used two tags. The first was <parsody> which allows Watson to have the same speaking speed as MLK. The other tag was <break> which allows me to make those dramatic pauses.

Using Audacity I was able to put the generated speech against the MLK speech. Then selecting the pause areas, I can quickly see the pause lengths.


I finally ended up with this:

Audacity also allows you to overlay audio, to get a feel to how it would sound if there were crowds listening.

The final script ends up like this:

<prosody rate="x-slow">I still have a dream.</prosody>
<break time="1660ms"></break>
<prosody rate="slow">It is a dream deeply rooted in the American dream.</prosody>
<break time="500ms"></break>
<prosody rate="slow">I have a dream</prosody>
<break time="1490ms"></break>
<prosody rate="x-slow">that one day</prosody>
<break time="1480ms"></break>
<prosody rate="slow">this nation <prosody rate="x-slow">will </prosody>ryeyes up</prosody>
<break time="1798ms"></break>
<prosody rate="slow">and live out the true meaning of its creed:</prosody>
<break time="362ms"></break>
<prosody rate="slow">"We hold these truths to be self-evident,</prosody>
<break time="594ms"></break>
<prosody rate="slow">that all men are created equal."</prosody>

I have zipped up all the files for download, just in case you are having issues running the audio.

In closing, if you plan to build a conversational system that speaks to the end user, you also need skills in talking to people, just not being able to write.

Speech to Text and Conversation

I thought I would take a moment to play with Speech to Text and a utility that was released a few months ago.

The Speech to Text Utils allows you to train S2T using your existing conversational system. To give a quick demo, I got my son to ask about buying a puppy.

I set up some quick Python code to print out results:

import json
from watson_developer_cloud import SpeechToTextV1

# ctx is Service credentials copied from S2T Service. 

s2t = SpeechToTextV1(

def wav(filename, **kwargs):
  with open(filename,'rb') as wav:
    response = s2t.recognize(wav, content_type='audio/wav', **kwargs)

if len(response['results']) > 0: 
  return response['results'][0]['alternatives'][0]['transcript']
  return '???';

So testing the audio with the following code:

wav_file = 'p4u-example1.wav'
print('Broadband: {}'.format(wav(wav_file)))
print('NarrowBand: {}'.format(wav(wav_file,model='en-US_NarrowbandModel')))

Gets these results:

Broadband: can I get a puppy 
NarrowBand: can I get a puppy

Of course the recording is crystal clear, which is why such a good result. So I added some ambient noises from SoundJay to the background. So now it sounds like it is in a subway.

Running the code above again get’s these results.

Broadband: Greg it appropriate 
Narrowband: can I get a phone


Utils to the rescue!

So the purpose of asking about a puppy is that I have a sample conversation system that is about buying a dog. Using that conversation file I did the following.

1: Installed Speech to Text Utils.

2: Before you begin you need to set up the connection to your S2T service (using service credentials).

watson-speech-to-text-utils set-credentials

It will walk you through the username and password.

3: Once that was set up, I then tell it to create a customisation.

watson-speech-to-text-utils corpus-from-workspace puppies4you.json

You need to map to a particular model. For testing, I attached it to en-US_NarrowbandModel and en-US_BroadbandModel.

4: Once it was run, I get the ID numbers for the customisations.

watson-speech-to-text-utils customization-list

Once I have the ID’s I try the audio again:

print('Broadband: {}'.format(wav(wav_file,customization_id='beeebd80-2420-11e7-8f1c-176db802f8de',timestamps=True)))
print('Narrowband: {}'.format(wav(wav_file,model='en-US_NarrowbandModel',customization_id='a9f80490-241b-11e7-8f1c-176db802f8de')))

This outputs:

Broadband: can I get a puppy 
Narrowband: can I get a phone

So the broadband now works. Narrowband is likely the quality is too poor to work with. There is also more specialised language models for children done by others to cope with this.

One swallow does not make a summer.

So this is one example, of one phrase. Really for testing, you should test the whole model. From a demonstration from development, it was able to increase a S2T model accuracy from around 50% to over 80%.



Watson V3 Certification

ibm-certified-application-developer-watson-v3-certificationSo I got my Watson V3 Certification a week or so ago, and the badge just arrived yesterday.

I sat the mock exam without studying and passed. So I thought I’d try the real exam, and passed that too.

Overall if you have been working in the Watson group for 3+ years, where your job role is to have medium to expert knowledge of all (non-Health) Watson products, then you are probably going to find the exam OK to pass.

For people who haven’t, it’s not going to be easy. I strongly recommend following the study guide on the test preparation certification page if you plan to get this.

My only quibbles on the exam is that the technology changes a lot.

For example, all the design patterns for coding conversation before December last year are not that relevant any more, and will likely change again soon. (Which is part reason for lack of updates on the blog, the other being laziness 🙂 )

So you need to know the current active technologies even if they are going away. Plus there will probably be a V4 exam in 6 months or so time.

I’d also like to see more focused certifications for some parts of the Watson Developer Cloud. For example, being an expert at Discovery Service, doesn’t make you an expert of Conversation and vise-versa.