So as I blogged about earlier, to help find conflicting training questions in your intents you would normally use K-Fold. There are issues in using this though.
- It removes some of your training data that can weaken intents.
- You have to balance number of folds for accuracy over speed.
- Requires to make multiple workspaces.
- Once you have created the results, you have to figure them out.
Looking back at an old blog post on compound questions, I realized that this already sees where a question can be confused with different intents.
So the next step was to determine to equate this to intent testing. Removing a question and retraining is not an option. It takes too long, and offers nothing over the existing K-Fold testing.
Sending the question back as-is will always return a 100% match. Every other intent gets a 0.0 score, so you can’t see confusion. But what if you changed the question?
First up, I took the workspace from the compound questions. Despite being a tiny workspace it works quite well. So I had to manufacture some confusion. I did this by taking a question from one intent and pasting it into another (from #ALLOW_VET to #DOG_HEALTH).
- Can my vet examine the puppies?
Next up for the code we define a “safe word”. This is prepended to any question when talking to Watson Assistant. In this case I selected “SIO”. What was interesting when testing this is that even if the safe word doesn’t make sense, it can still impact the results on an enterprise level (ie. 1000’s of questions).
We end up with this response:
- SIO Can my vet examine the puppies?
ALLOW_VET = 0.952616596221924 DOG_HEALTH = 0.9269126892089845
Great! We can clearly see that these intents are confused with each other. So using the K-Means code from before we can document actual confusion between intents.
I’ve created some Sample Code you can play with to recreate.
Just in case you are not aware, there is also a Watson Assistant premium feature which will do something similar. It’s called “conflict resolution” (CR). Here is a good video explaining the feature.
Compared to that and K-Fold…
- This sample code is faster and cleaner than K-Fold.
- Narrows down to an individual question.
- (with customizations) Can show multiple intent confusion.
- Unlike K-Fold, later tests only require you to test new & updated intents.
- Considerably slower than CR. When using CR, it only takes a few seconds to test everything. The sample can take up to a second per question to test.
- Costs an API call for each question test.
- Can’t test the model where the question would not be in the training. (What K-Fold does)
- CR is slightly more accurate in detection. (based on my tests)
- CR allows you to swap/edit questions within the same UI. The example I posted requires manual work.