So this really only helps if you are doing a large number of intents, and you have not used entities as your primary method of determining intent.
First lets talk about perceived accuracy, and what this is trying to solve. Perceived accuracy is where someone will type in a few questions they know the answer to. Then depending on their manual test they perceive the system to be working or failing.
It puts the person training the system into a false sense of how it is performing.
If you have done the Watson Academy training for Conversation you will hear it mention K-fold testing. For this blog post, I’m going to skip the details as I briefly mentioned before.
K-fold cross validation : You split your training set into random segments (K). Use one set to test and the rest to train. You then work your way through all of them. This method will test everything, but will be extremely time-consuming. Also you need to pick a good size for K so that you can test correctly.
K-Fold works well by itself if you have a large training set that has come from a real world representative users. You will find this rarely happens. So you should use in conjunction with a blind.
Previously I didn’t cover how you actually do the test. So with that, here is the notebook giving a demonstration: