Debugging your extension

When working with extensions in Watson Assistant just using the standard UI can be cumbersome to do deep dive analysis on why your extension is not working as expected.

You can use the browsers inspector to look at what is sent and received. You do this by going to the network tab, select “Response” then filter by “callout”. Once you get to the line where callout is mentioned remove the filter and then you can see all the parts.

For the video demo below I created a sample extension that pulls jokes from “I Can Haz Dad Joke” via their API. The sample extension is attached.

Non-Blocking option buttons

This is a common request that comes up often with the new Watson Assistant.

This is where you offer options at the end of an action which can link off to another action. Using normal options forces the user to respond to the buttons and prevents them leaving that action.

Another use case is asking the user if they were happy with the response. In this example I’m going to show how to do a thumbs up/down as a follow up.

First we need to create a Thumbs up and Thumbs down intent. Both are exactly the same. It is one training example, the emoticon 👍 or 👎. One step that says if they had a positive or negative response.

Next click the options on the action and make sure to switch off “Ask clarifying questions”.

Same for thumbs up/down and survey later.

Once you have created those two actions. Next is the survey action. This should contain no training questions and the settings above switched off. This is to prevent it triggering except called from another action.

The step you are going to select the </> button to show the json and replace with the following:

  "generic": [
      "title": "How was my response?",
      "options": [
          "label": "👍",
          "value": {
            "input": {
              "text": "👍"
          "label": "👎",
          "value": {
            "input": {
              "text": "👎"
      "response_type": "option"

This creates custom option buttons which are non-blocking.

Lastly in your action you want the response to trigger the survey. You can do this by calling the action as a sub-action.

Now when you run it, you end up with something like below.

As always, I’ve included the sample skill for you to try.

In a real world production instance it is not good practise to have a thumbs/up down after every response. As this reduces the users confidence in the system.

Imagine having a customer support person asking for validation after every answer given. You wouldn’t trust their response either.

The other part of it is that often end users will make their thumbs up/down from an emotional response, and not a logical one.

If you really need to do this, then I recommend to put a confidence level into survey. So it doesn’t trigger unless the confidence level is very low. A better option is to use the custom survey demonstrated in the Lendyr demo.

Visualising Coverage in Conversation Logs.

One of the most important parts of a conversational system is to ensure that your end users are getting the most benefit out of it. To do this requires looking at patterns in your conversation logs. It can be time consuming.

A common approach is to put markers into your nodes then look for those entry/exit point markers. But a user question can hit multiple nodes + slots across multiple log lines. Making it tricker to see. Here is a couple of approaches to try and easily get information on your complex flows.

For this demo I am using the default demo skill in Watson Assistant to generate logs. I have created a number of simple conversations. A couple demonstrate a issue with how the user may interact. I have also supplied the example notebook and files for you to try out.

Creating the graph.

For generating I first need to convert the log to a graph format. The easiest way to is look at the nodes_visited column in the logs. Here is an example of a user making a reservation.

['Reservation using slots', 'handler_104_1498132501942', 'slot_102_1498132501942', 'handler_103_1498132501942', 'handler_6_1509695999145', 'handler_104_1498132501942', 'slot_102_1498132501942', 'handler_103_1498132501942', 'handler_107_1498132552870', 'slot_105_1498132552870']
['slot_105_1498132552870', 'handler_106_1498132552870', 'handler_10_1509132875735', 'slot_8_1509132875735', 'handler_9_1509132875735', 'handler_17_1509135162089', 'handler_104_1498132501942', 'slot_102_1498132501942']
['slot_102_1498132501942', 'handler_103_1498132501942', 'handler_107_1498132552870', 'slot_105_1498132552870', 'handler_106_1498132552870', 'handler_10_1509132875735', 'slot_8_1509132875735']
['slot_8_1509132875735', 'handler_9_1509132875735', 'handler_14_1509133469904', 'handler_24_1522444583114', 'slot_22_1522444583114', 'handler_23_1522444583114', 'handler_22_1522598191131', 'node_3_1519173961259', 'Reservation using slots']

Although each line is an interaction you can see that it is in fact a chain of events. When joining the chains you end up with.

['Opening'] ['Reservation using slots', 'handler_104_1498132501942', 'slot_102_1498132501942', 'handler_103_1498132501942', 'handler_6_1509695999145', 'handler_104_1498132501942', 'slot_102_1498132501942', 'handler_103_1498132501942', 'handler_107_1498132552870', 'slot_105_1498132552870', 'handler_106_1498132552870', 'handler_10_1509132875735', 'slot_8_1509132875735', 'handler_9_1509132875735', 'handler_17_1509135162089', 'handler_104_1498132501942', 'slot_102_1498132501942', 'handler_103_1498132501942', 'handler_107_1498132552870', 'slot_105_1498132552870', 'handler_106_1498132552870', 'handler_10_1509132875735', 'slot_8_1509132875735', 'handler_9_1509132875735', 'handler_14_1509133469904', 'handler_24_1522444583114', 'slot_22_1522444583114', 'handler_23_1522444583114', 'handler_22_1522598191131', 'node_3_1519173961259', 'Reservation using slots']

The second part is the whole interaction the user had in trying to book an appointment. It’s still not that readable. So I converted these over to make a little more readable.

  • slot_ = Take the variable that the slot object depends on.
  • node_ = Take the condition for the node in the skill.
  • frame = Top level slot node (not shown above, it’s part of the skill node attributes). Took the condition of the node.
  • response = This is the node that responds to the end user, or part of the slot. Added “response to: <parent node name>”
  • handler = Left the same.

Once this is done I started by converting the chain to Graph nodes and edges. For each time an edge is repeated a count is incremented to the edge object. You end up with this.

Red nodes are entry points to a single flow. Orange is a flow which could have been entered though other parts of the conversation. Blue are the slot values. Pink is a final response to the user from the flow.

As you can see it’s still a mess!

By selecting the entry point node you can delete all other nodes that do not have a path to it. In this case I selected “frame: #Customer_Care_Appointments”. This was generated.

Still a bit of a mess and not easy to see how the paths are flowing through the booking appointment. NetworkX was designed more for analysing graphs than visualising them.

Graph to Sankey

So using the generated graph data I moved it over to a Sankey. The nice thing with plotly is you can easily move the flows to see what is going on. Here is what is generated using the graph information from the last image.

Edge colors are red where there is more output from a node than there is input. In a normal conversational flow it should be fairly static if well trained. Not all red is an issue though. Taking the two biggest we can use these to drill down to a root cause.


This is showing a lot of users are not progressing through the phone section of the flow and are going into a loop. As the second part is much smaller it would suggest that people are giving up on the flow. Looking through the logs shows the following pattern.

Clearly the end users are having problems trying to enter in a valid phone number. So this is something that should be looked at in resolving.


You can see three inputs into the handler before it passes over to the “Ask for date” slot. This isn’t an issue as there are three conditions this could happen.

  • User supplies a date when asking for the appointment.
  • System asks the user for the date.
  • User asks to redo the appointment at final confirmation.

The handler is doing what it should be doing.


So this example is showing just one way to approach this problem. I’d be interested to hear how others are dealing with this.

Creating a Quantum Computer Chatbot.

I normally do up these small and quick projects to help practice technologies I work with. Also to keep me a bit sane.

For this fun little project I thought about creating a chatbot that can translate a simple conversation into a format that can be understood by a quantum computer.

The plan is to build a Grover Algorithm circuit that will determine the best combination of people who like/dislike each other.

The architecture is as follows:

Breaking down each component.

  • IPad App (Swift): Why? Because Javascript annoys me. 🙂 Actually creating apps is very easy and swift is a lovely language. If you haven’t coded in it and want to, I recommend the App Brewery training.
  • Orchestration Layer (Python/Flask): My focus was on speed and python has all the modules to easily interact with everything else. Perfect for backend demo.
  • Watson Assistant: This is to handle the human interaction. Also to pull out the logical components and actors mentioned in the conversation.
  • Equation Generator: When the user asks to solve the problem, it translates the Watson Assistant results to an equation that Qiskit can run on.
  • Quantum Engine: This is just a helper class I created to build and run the quantum circuit, and then hand the results off to the reporting NLP. Of course what comes back is all 1’s and 0’s.
  • Reporting NLP: This takes the result of the quantum computer and converts it into meaningful report to the human. This is then handed back to the iPad App to render.

All this was built and running in a day. It’s not because I’m awesome 😉 but that the technology has moved forward so much that much of the heavy lifting is handled for you.

I’m not going to release the code (if you want some code, why not try pong? I wrote over the weekend). I will go over some of the annoyances that might help others. But first a demo.

This is a live demo. Nothing is simulated.

Watson Assistant

This was the easiest and most trivial to set up. Just three intents and one entity. The intents detect if two people should be considered friendly or unfriendly. The names of the two people is picked up by the entities. The last intent just triggers the solve process.

Equation Generator

This is a lot less exciting than it sounds. When sending a formula to qiskit it needs to take it in a format like so:

((A ^ B) & (C & D) & ~(D & A))

Which is something like “Bob hates Jane, Mike likes Anna, Mike and Bob don’t get on” in normal human speech.

Each single letter has to equate to each person mentioned. So those have to be kept track of as well as the relationships to build this.

Quantum Computing

So Qiskit literally holds your hand for most of this. It’s a fun API. If you want to start off learning Quantum Computing I strongly recommend “Learn Quantum Computing with Python and IBM Quantum Experience“. It focuses from a developer perspective, making it easier to start working through the math later.

To show how simple it is, Qiskit has a helper class called an Oracle. This is literally all the code to build and run the circuit.

# example expression
expression = '((A ^ B) & (C & D) & ~(D & C))'

oracle = LogicalExpressionOracle(expression)
quantum_circuit = oracle.construct_circuit()

quantum_instance = QuantumInstance(BasicAer.get_backend(quantum_computer), shots=2048)

grover = Grover(oracle)
result =

What you get back is mostly 1’s and 0’s. You can also generate graphs from the helper class, but they tend to be more for the Quantum Engineer.


I used the report generated by Qiskit. But as the results are all 0/1 and backwards. I translate them out to the ABCD… and then added a legend to the report. That was all straight forward.

The tricky bit came in sending the image back to the iPad app. To do this I converted the image to base64 like so (Using OpenCV):

def imageToBase64(img):
    b = base64.b64encode(cv2.imencode('.png', img)[1]).decode()
    return f'{b}'

On the Swift side of things you can convert the base64 string back to an image like so.

func base64ToImage(_ base64Text: String) -> UIImage {
    let imageData = Data(base64Encoded: base64Text)
    let image = UIImage(data: imageData!)
    return image!

Getting it to render the image in a UIViewTable was messy. I ended up creating a custom UIViewTableCell. This also allowed to make it feel more chat-botty.

When I get around to cleaning up the code I’ll release and link here.

In closing…

While this was a fun distraction, it’s unlikely to be anything beyond a simple demo. There already exists complex decision optimization that can handle human interaction quite well.

But the field of quantum computing is changing rapidly. So it’s good to get on it early. 🙂

Rendering Intents in a 3D network graph

It’s a been a while…

I know some people were asking how to build a network graph of Intents/Questions. Personally I’ve never seen it at all useful, but I am bored, so I have created some sample code to do this.

The code will convert a CSV intents file to a pandas dataframe, then convert that to networkx graph format. Of course large graphs can be very messy like so.

That’s just 10 intents!

So I then converted this to K3D format so that you can view the same network in 3D, like so:

Hopefully someone finds it useful. 🙂

Multi-Lingual chat bot with cloud functions.

Bonus post for being away for so long! 🙂 Let’s talk about how to do a multi-lingual Chatbot (MLC).

Each skill is trained on its own individual language. You can mix languages into a single skill, however depending on the language selected the other is treated as either keywords or default language words. This can be handy if only certain language words are commonly used across languages.

For languages like say Korean or Arabic, it gives a limited ability versus say English. For something like English with Spanish, it simply does not work.

There are a number of options to work around this.

Landing point solution

This is where the entry point into your chatbot defines the language to return. By far the easiest solution. In this case you would have for example an English + Spanish website. If the user enters the Spanish website, they get the Spanish skill selected. Likewise with English.

Slightly up from this is having the user select the language at the start of the bot executing. This can help where the anonymous user is forced to one language website, but can’t easily find how to switch.

The downside for this solution is where end users mix language. Somewhat common in certain languages. For example Arabic. They may type in the other language only to get a confused response from the bot.

Preparation work

To show the demo, I first need to create two skills. One in English and one in Spanish. I select the same intents from the catalog to save time.

I also need to create Dialog nodes.. but that is so slow to do by hand! 🙁 No problem. I created a small python script to read the intents and write my dialog nodes for me like so:

Here is the sample script for this demo. Totally unsupported, and unlikely to work with large number of intent. But should get you started if you want to make your own.

Cloud functions to the rescue!

With the two workspaces created for each language we can now create a cloud function to handle the switching. This post won’t go into details on creating a cloud function. I recommend the built in tutorials.

First in the welcome node we will add the following fields.

Field Name Value Details
$host “” Set to where you created your cloud function
$action “workspaceLanguageSwitch” This is the name of the action we will create.
$language “es” The language of the other skill.
$credentials {“user”:”USERNAME”,”password”:”PASSWORD”} The cloud function username + password.
$namespace “ORG_SPACE/actions” The name of your ORG and SPACE.
$workspace_id “…” The workspace ID of the other skill.

Next we create a node directly after the welcome one with a conditional of “!$language_call” (more on that later). We also add action code as follows.

The action code allows us to call to the cloud function that we will create.

The child nodes of this node will either skip if no message comes back, or display the results of the cloud function.

On to the cloud function. We give it the name “workspaceLanguageSwitch”.

This cloud function does the following.

  • Checks that a question was sent in. If not it sends a “” message back.
  • Checks that the language of the question is set to what was requested. For example: In the English skill we check for Spanish (es).
  • If the language is matched, then it sends the question as-is to the other workspace specified. It also sets “$language_call” to true to prevent a loop.
  • Returns the result with confidence and intent.

Once this is created, we can test in the “Try it out” screen for both Spanish and English.

Here is all the sample code to try and recreate yourself.

It’s not all a bed of roses.

This is just a demo. There are a number of issues you need to be aware of you take this route.

  • The demo does a one time shot. So it will only work with Q&A responses. If you plan to use slots or process flows, then you need to add logic to store and maintain the context.
  • Your “Try it out” is free to use. Now that you have a cloud function it will cost money to run tests (all be it very cheap).
  • There is no fault control in the cloud function. So all that would need to be added.
  • Cloud functions must complete execution within 5 seconds. So if your other skill calls out to integration, it can potentially break the call.

Simple Intent Tricks

As usual it’s been a while since I’ve updated the blog, so as to keep it alive here is a quick update. For this post I’m going to show two simple techniques using the intents[] object over the #Intents reference.

As always I recommend that you only implement these if there is evidence it is needed.

Repeating Questions

A common issue with end users is that they don’t always understand the answer they get. It is very common for the user to gloss over what has been shown to them, and they can miss the actual answer they wanted.

More often than not, the user just doesn’t read the answer. 🙂

By default the user asks the question in a slightly different way, gets the same intent and asks again. One of the ways to detect how to work on answers is to look for this pattern in the logs.

Another option is to change up the conversation like a human would. To this end, we will detect if they got the same answer. First we create a $last_intent context variable, and check that like so:

Now if this node doesn’t trigger it’s important to clear the $last_intent

But this node needs to jump back to the branch to continue on. You will notice that I created a dummy node to jump to. This is a good coding convention. You put any further nodes below the jump node. This prevents the jump logic breaking if you add nodes.

Another thing to notice is the Answer General Questions condition logic:


This allows you to group similar context intents into a single node. Normally most people will just pick the related intents for the condition block. So you end up with something like this:

The problem with above is that you will be more prone to making mistakes. While some of you might have spotted the “and” mistake, what is less noticeable is the missing intent #General_Ending. You can spend ages wondering why your message isn’t being displayed despite being in the node. With the one line code earlier, you don’t have to worry about any of this.

Here is the sample skill to play with.

Compound Questions

Next up is compound questions. I discussed this before, but from a code perspective. This example we are going to try doing it from within the skill itself.


  • Watson Assistant Plus already has this feature.
  • Due to how skills work, this example cannot exceed 50 intents. Watson Assistant will disable the dialog logic if it hits the same node 50 times. This is to prevent a possible endless loop.
  • This will likely not work if you have any slots.

If you need it for more than 50 nodes then you will need to do it with code, or WA Plus (which is much easier).

First we start by creating a multi-response node (main condition is anything_else). The first response you should edit in the advanced tab and set as follows:

The condition block is just a way to see if the first intent and second intent are close. There are a number of ways to do this, for example K-Means, or difference in percent rather than scaled. While K-Means is not easily done in WA, this method works but tends to be a bit more sensitive. So play around and see which you prefer.

Once the condition is hit we set a $answer_counter to 1 and $compound_found to true.

If it’s not hit we just set $compound_found to false. We don’t need to worry about the $answer_counter as it will always be 0 when in this position.

For the intent matching in the dialog nodes you cannot use the #Intent shortcuts. Instead you do the following:

    intent[$answer_counter].intent  == "General_Greetings"

If the counter is set to 1, it will respond with the second intent first. Then we decrement that counter and loop through all intents again. You end up with something like this:

Again, here is a sample skill:

What did you say?

A question recently asked was “How can I get Watson Conversation to repeat what I last asked it”? There are a couple of approaches to solve this, and I’d thought I would blog about them. First here is an example of what we are trying to achieve. 

One thing to understand when going forward. Everything you build should be data driven. So while there are valid use cases where this is needed, it doesn’t mean it is needed for every solution you build, unless evidence exists otherwise. 

Approach 1. Context Variable.

In this example we create a context variable at every node where we want the system to respond, like so: 

This works but prevents easily creating variations of the response. On the plus side you can give normal responses, but when the user asks to repeat, it can give a fixed custom response. 

Approach 2. Context variable everything!

Similar to the last approach except rather than creating the context variable in the context area, you build on the fly. So something like so:

This allows you to have custom responses. A disadvantage (all be it minor) is you are increasing the chance of a mistake in your code happening. Each response is adding 4 bytes to your overall skill/workspace size. This means nothing for small workspaces, but when you are enterprise level you need to be careful. 

I’ve attached a sample of above

Approach 3. Application layer.

With this method your application layer keeps a previous snapshot of the answer. Then when a repeat intent is detected, you just return the saved answer. 

I’ve seen some crazy stuff of resending question, modifying node maps, but really this is the simple option. 

Tips for 1st time WKS users.

“Tools are the subtlest of traps”

Most developers learn the dangers of evil wizards, for everyone else it might not be so obvious. In the Watson tooling the purpose is democratize AI, that is to abstract the AI layer from your knowledge worker.

This allows you to utilize the power of AI, without having to search for the mythical person who understands your business and NLP, AI, etc.

While this can remove a lot of the complexity, it can also lead people into a false sense of security that their hand will be held through the whole process.

So I am taking a time out to talk about Watson Knowledge Studio. For those that don’t know what it is. The tool allows you to annotate your domain documents, and surface structured insights from unstructured data. It is extremely powerful and easy to use versus other solutions out there.

The downside is that it is extremely easy to use. So I have a number of different people/companies rush in and create a model that disappoints, or in some cases infuriates. It’s not that this is unique to WKS, only that you can overlook important steps in your workflow.

Now IBM does do 3-4 days training, and there are a number of videos (slightly out of date) that cover some of this. But to help some people starting off, I am going to list the main pitfalls you need to watch out for when doing your first WKS project.

Understand what you need to surface from your data!

Screen Shot 2018-11-25 at 11.48.38 AMThis happens so often with technical people. They look at the tooling, see how easy it is and run off and annotate the world in their documents.

What normally happens in this regard is a very poor model which surfaces information correctly, but that data is meaningless to the business.

Get your business analysts/SMEs from the start.

You need someone to objectively understand what is the business problem you are trying to solve. You need to look at your data sources and determine if you can even surface that information (ie. enough samples to train).

Limit your Types and Relationships on your first pass.

After you have looked at what you want to surface, you need to focus on a small few number of types and relationships. Your BA/SME might have picked 50-100, but generally you should pick in around 20-40. The are reasons for this.

  • Each type/relationship adds more work for your human annotator.
  • Models can build faster.
  • As you work through documents you will find that your needs for types/relationships may change.

Don’t reinvent the wheel.

If you have existing annotators that will work as-is, don’t try and integrate them to your model. They may be part of your business requirement, but all you are doing is adding complexity to your model. You can run a second pass on your finished data to get that information.

Understand when to use rule based versus model based.

The purpose of the AI model is to have it train and understand content it has never seen before. To do this requires a lot of up front work on annotation and training.

Compare this to the rule based model. If you know new terms/phrases may not come up, but the nature of how they may be written changes, then rule based may solve your issue.

Personally the AI model is the better choice if you plan to go with WKS. There is easier tooling for rule based. For example Watson Explorer Studio.

Inter-Annotator Agreement is King.

Screen Shot 2018-11-25 at 11.49.53 AMTwo things to realize before you start annotating.

  • Just because you are an expert in the content, doesn’t mean you are an expert at annotating.
  • The more subject matter experts you have, the less agreement on topics will happen in the real world.

To that end, you need to clearly define your inter-annotator agreement (IAA) so there is no ambiguity or disagreement. Have examples, and also have a single SME as the deciding factor where further disagreements occur.

Not creating a proper IAA can lead to more work to your main SME, and damage your model to the extent of hours/days of wasted work.

Data Wrangling is required.

Most of the work in formatting your data is to keep your human annotator sane. Annotating a document is a mentally exhausting process that normally follows these steps.

  • Read and understand the paragraph.
  • Annotate the paragraph with types and relationships.
  • Read and annotate the co-references.
  • Fix mistakes as you go.

You want to reduce the amount of time to do this for each document, and working set. So if there is information that isn’t required to annotate, remove it. If your document is very small, then join together (with some clear marker of a new document).

You want your document to be to be annotated in 30 minutes or so, and your document set in a day. This will allow you to progress at a reasonable speed, and build frequent models.

On top of this, you should also look at sourcing any dictionaries/terms that can be used to  kickstart the annotation. (which most people do)

Lastly, check to see how your documents are ingested into WKS. For example I’ve seen instances of “word.word”. WKS sees this as a single term, and fixing that annotation can be annoying. It may be you need to do some formatting, or limit these mistakes.

Build your model soon and often.

You can’t really see what you are doing wrong until 2-3 models in. So it is important to build these models as soon as you can.

To that end I would recommend building a model as soon as you have a working set completed. Try to have sets be annotated within 1-2 days max, at least at the start of the project.

You can quickly see where the IAA is lacking, and if you need to change types/relationships or even data. Doing this sooner than later prevents technical debt of fixing the model.

Let the model work for you.

Screen Shot 2018-11-25 at 11.56.29 AMOnce you have gotten 3-4 models created, and you are comfortable with some of the scoring, have it pre-annotate future working sets. It will reduce the mental requirements for the human annotators.

However! If you still have 1-2 of the three areas performing badly, recommend to your human annotator to just delete the poor performing part and redo. For example if Co-Reference doesn’t work well, just delete all co-references and redo. This is considerably faster than trying to manually fix every annotation error.

So I hope this helps those in their first journey into using WKS. Be aware that this is by no means a full tutorial.