Not the bamboo eating kind (but they are cute too), Python Pandas!
But first… Conversation has a new feature!
Logging!
You can now download your logs from your conversation workspace into a JSON format. So I thought I’d take this moment to introduce Pandas. Some people love the “Improve” UI, but personally I like being able to easily mold the data to what I need.
First, if you are new to Python, I strongly recommend getting a Python Notebook like Jupyter set up or use IBM Data Science Experience. It makes learning so much easier, and you build your applications like actual documentation.
I have a notebook created so you can play along.
Making a connection
As the feature is just out, the SDK’s don’t have the API for it, so I will be using requests library.
url='https://gateway.watsonplatform.net/conversation/api/v1/workspaces/WORKSPACE_ID/logs?version=2017-04-21' basic_auth = HTTPBasicAuth(ctx.get('username'), ctx.get('password')) response = requests.get(url=url, auth=basic_auth) j = json.loads(response.text)
So we have the whole log now sitting in j
but we want to make a dataframe. Before we do that however, let’s talk about log analysis and the fields you need. There are three areas we want to analyse in logs.
Quantitive – These are fixed metrics, like number of users, response times, common intents, etc.
Qualitative – This is analysing how the end user is speaking, and how the system interpreted and responded. Some examples would be where the answer returned may give the wrong impression to the end user, or users ask things out of expected areas.
Debugging – This is really looking for coding issues with your conversation tree.
So on to the fields that cover these areas. These are all contained in j['response']
.
Field | Usage | Description |
input.text | Qualitative | This is what the user or the application typed in. |
intents[] | Qualitative | This tells you the primary intent for the users question. You should capture the intent and confidence into columns. If the value is [] then means it was irrelevant. |
entities[] | Quantitive | The entities found in relation to the call. With this and intents though, it’s important to understand that the application can override these values. |
output.text[] | Qualitative | This is the response shown to the user (or application). |
output.log_messages | Debugging | Capturing this field is handy to look for coding issues within your conversation tree. SPEL errors show up here if they happen. |
output.nodes_visited | Debugging Qualitive |
This can be used to see how a progression through a tree happens |
context.conversation_id | All | Use this to group users conversation together. In some solutions however, one pass calls are sometimes done mid conversation. So if you do this, you need to factor that in. |
context.system.branch_exited | Debugging | This tells you if your conversation left a branch and returned to root. |
context.system.branch_exited_reason | Debugging | If branch.exited is true then this will tell the why. completed means that the branch found a matching node, and finished. fallback means that it could not find a matching node, so it jumps back to root to find the match. |
context.??? | All | You may have context variables you want to capture. You can either do these individually, or code to remove conversation objects and grab what remains |
request_timestamp | Quantitive Qualitative |
When conversation received the users response. |
response_timestamp | Quantitive Qualitative |
When conversation responded to the user. You can do a delta to see if there are conversation performance issues, but generally keep one of the timestamp fields for analysis. |
So we create a row array, and fill it with dict objects of the columns we want to capture. For clarity of the blog post, the sample code below
import pandas as pd rows = [] # for object in Json Logs array. for o in j['logs']: row = {} # Let's shorthand the response object. r = o['response'] row['conversation_id'] = r['context']['conversation_id'] # We need to check the fields exist before we read them. if 'text' in r['input']: row['Input'] = r['input']['text'] if 'text' in r['output']:row['Output'] = ' '.join(r['output']['text']) # Again we need to check it is not an Irrelevant response. if len(r['intents']) > 0: row['Confidence'] = r['intents'][0]['confidence'] row['Intent'] = r['intents'][0]['intent'] rows.append(row) # Build the dataframe. df = pd.DataFrame(rows,columns=['conversation_id','Input','Output','Intent','Confidence']) df = df.fillna('') # Display the dataframe. df
When this is run, all going well you end up with something like this:
The notebook has a better report, and is also sorted so it is actually readable.
Once you have everything you need in the dataframe, you can manipulate it very fast and easy. For example, let’s say you want to get a count of the intents found.
# Get the counts. q_df = df.groupby('Intent').count() # Remove all fields except conversation_id and intents. q_df = q_df.drop(['request TS', 'response TS', 'User Input', 'Output', 'Confidence', 'Exit Reason', 'Logging'],axis=1) # Rename the conversation_id field to "Count". q_df.columns = ['Count'] # Sort and display. q_df = q_df.sort_values(['Count'], ascending=[False]) q_df
This creates this:
The Jupyter notebook also allows for visualisation of data as well. Although I haven’t put any in the sample notebook.