While the complexity of building Conversation has reduced for non-developers, one of the areas that people can sometimes struggle is training intents.
When trying to determine how the system performs, it is important to use tried and true methods of validation. If you go by someone just interacting with the system you can end up with what is called “perceived accuracy”. A person may ask three to four questions, and three may fail. Their perception becomes that the system is broken.
Using cross validation allows you to give a better feel of how the system is working. As will a blind / test set. But knowing how the system performs and trying to trying to interpret the results is where it takes practise.
Take this example test report. Each line is where a question is asked, and it determines what the answer should be, versus what answer came back. The X denotes where an answer failed.
Unless you are used to doing this analysis full time, it is very hard to see the bigger picture. For example, is DOG_HEALTH the issue, or is it LITTER_HEALTH + BREEDER_GUARANTEE?
You have to manually analyse each of these clusters and determine what changes are required to be made.
Thankfully Sci-Kit makes your life easier with being able to create a confusion matrix. With this you can see how each intent performs against each other. So you end up with something like this:
So now you can quickly see what Intents are getting confused with others. So you focus on those to improve your accuracy better. In the example above, DOG_HEALTH and BREEDER_INFORMATION offer the best areas to investigate.
I’ve created a Sample Notebook which demonstrates the above, so you can modify to test your own conversations training.