I love Pandas!

Not the bamboo eating kind (but they are cute too), Python Pandas!

But first… Conversation has a new feature!

Logging! 

You can now download your logs from your conversation workspace into a JSON format. So I thought I’d take this moment to introduce Pandas. Some people love the “Improve” UI, but personally I like being able to easily mold the data to what I need.

First, if you are new to Python, I strongly recommend getting a Python Notebook like Jupyter set up or use IBM Data Science Experience. It makes learning so much easier, and you build your applications like actual documentation.

I have a notebook created so you can play along.

Making a connection

As the feature is just out, the SDK’s don’t have the API for it, so I will be using requests library.

url='https://gateway.watsonplatform.net/conversation/api/v1/workspaces/WORKSPACE_ID/logs?version=2017-04-21'
basic_auth = HTTPBasicAuth(ctx.get('username'), ctx.get('password'))
response = requests.get(url=url, auth=basic_auth)
j = json.loads(response.text)

So we have the whole log now sitting in j but we want to make a dataframe. Before we do that however, let’s talk about log analysis and the fields you need. There are three areas we want to analyse in logs.

Quantitive – These are fixed metrics, like number of users, response times, common intents, etc.

Qualitative – This is analysing how the end user is speaking, and how the system interpreted and responded. Some examples would be where the answer returned may give the wrong impression to the end user, or users ask things out of expected areas.

Debugging – This is really looking for coding issues with your conversation tree.

So on to the fields that cover these areas. These are all contained in j['response'].

Field Usage Description
input.text Qualitative This is what the user or the application typed in.
intents[] Qualitative This tells you the primary intent for the users question. You should capture the intent and confidence into columns. If the value is [] then means it was irrelevant.
entities[] Quantitive The entities found in relation to the call. With this and intents though, it’s important to understand that the application can override these values.
output.text[] Qualitative This is the response shown to the user (or application).
output.log_messages Debugging Capturing this field is handy to look for coding issues within your conversation tree. SPEL errors show up here if they happen.
output.nodes_visited Debugging
Qualitive
This can be used to see how a progression through a tree happens
context.conversation_id All Use this to group users conversation together. In some solutions however, one pass calls are sometimes done mid conversation. So if you do this, you need to factor that in.
context.system.branch_exited Debugging This tells you if your conversation left a branch and returned to root.
context.system.branch_exited_reason Debugging If branch.exited is true then this will tell the why. completed means that the branch found a matching node, and finished. fallback means that it could not find a matching node, so it jumps back to root to find the match.
context.??? All You may have context variables you want to capture. You can either do these individually, or code to remove conversation objects and grab what remains
request_timestamp Quantitive
Qualitative
When conversation received the users response.
response_timestamp Quantitive
Qualitative
When conversation responded to the user. You can do a delta to see if there are conversation performance issues, but generally keep one of the timestamp fields for analysis.

 

So we create a row array, and fill it with dict objects of the columns we want to capture. For clarity of the blog post, the sample code below

import pandas as pd
rows = []

# for object in Json Logs array.
for o in j['logs']:
    row = {}
 
    # Let's shorthand the response object.
    r = o['response']
 
    row['conversation_id'] = r['context']['conversation_id']
 
    # We need to check the fields exist before we read them. 
    if 'text' in r['input']: row['Input'] = r['input']['text']
    if 'text' in r['output']:row['Output'] = ' '.join(r['output']['text'])
 
    # Again we need to check it is not an Irrelevant response. 
    if len(r['intents']) > 0:
        row['Confidence'] = r['intents'][0]['confidence']
        row['Intent'] = r['intents'][0]['intent']

    rows.append(row)

# Build the dataframe. 
df = pd.DataFrame(rows,columns=['conversation_id','Input','Output','Intent','Confidence'])
df = df.fillna('')

# Display the dataframe. 
df

When this is run, all going well you end up with something like this:

report1-1804

The notebook has a better report, and is also sorted so it is actually readable.

report2-1804

Once you have everything you need in the dataframe, you can manipulate it very fast and easy. For example, let’s say you want to get a count of the intents found.

# Get the counts.
q_df = df.groupby('Intent').count()

# Remove all fields except conversation_id and intents. 
q_df = q_df.drop(['request TS', 'response TS', 'User Input', 'Output', 'Confidence', 'Exit Reason', 'Logging'],axis=1)

# Rename the conversation_id field to "Count".
q_df.columns = ['Count']

# Sort and display. 
q_df = q_df.sort_values(['Count'], ascending=[False])
q_df

This creates this:

report3-1804

The Jupyter notebook also allows for visualisation of data as well. Although I haven’t put any in the sample notebook.

6 thoughts on “I love Pandas!

  1. Just be aware that using “Try it out” does not store anything into the logging. So if you want to practise with your own application, you need to interact using the API.

  2. Got a link to the sample notebook? Love the blog post – shows how you can use something on your own machine to check on your application, rather than guessing at things, and then blindly throwing in intents and training via the UI.

Leave a Reply