One of common requests for conversation is being able to understand the running topic of a conversation.
USER: Can I feed my goldfish peas?
WATSON: Goldfish love peas, but make sure to remove the shells!
USER: Should I boil them first?
The second response “them” is called an “anaphora”. The “them” refers to the peas. So you can’t answer the question without first knowing the previous question.
On the face of it, it looks easy. But you have “goldfish”, ‘peas”, ‘shells” which could potentially be the reference, and no one wants to boil their goldfish!
So the tricky part is determining the topic. There are a number of ways to approach this.
The most obvious way is to determine what entity the person mentioned, and store that for use later. This works well if the user actually mentions an entity to work with. However in a general conversation, the subject of the conversation may not always be by the person who asks the question.
When asking a question and determining the intent, it may not always be that an entity can be involved. So this has limited help in this regards.
That said, there are certain cases where intents have been used with a context in mind. So it can be easily done by creating a suffix to the intent. For example:
In this case we believe that peas is a common entity that has a relationship to the intent of Feeding Fish. For coding convention we use “_e_” to denote that the following piece of the intent name is an entity identifier.
At the application layer, you can do a regex on the intent name “_e_(.*?)$” for the group 1 result. If it is not blank, store it in a context variable.
Like before, you can use regular expressions to capture an earlier pattern to store it at a later point.
One way to approach this is have a gateway node that activates before working through the intent tree. Something like this:
The downside to this is that there is a level of complexity to maintain in a complex regular expression.
You can make at least maintaining a little easier by setting the primary condition check as “true” and then individual checks in the node itself.
An answer unit is the text response you give back to the end user. Once you have responded with an answer, you have created a lot of context within that answer that the user may follow up on. For example:
Even with the context markers of the answer, the end user may never pick up on them. So it is very important to craft your answer that will drive the user to the context you have selected.
The last option is to pass the questions through NLU. This should be able to give you the key terms and phrases to store as context. As well as create knowledge graph information.
I have the context. Now what?
When the user gives a question that does not have context, you will normally get back low confidence intents, or irrelevant response.
If you are using Intent based context, you can check the returning intents for a similar context to what you have stored. This also allows you to discard unrelated intents. The results from this are not always stellar, but offer a cheaper one time call.
The other option you can take is to preload the question that was asked and send it back. For example:
PEAS !! Can I boil them first?
You can use the !! as a marker that your question is trying to determine context. Handy if you need to review the logs later.
As time passes…
So as the conversation presses on, what the person is talking about can move away from the original context, but it may still remain the dominant. One solution is to build a weighted context list.
"entity_list" : "peas, food, fish"
In this case we maintain the last three context found. As a new context is found, it uses LIFO to maintain the list. Of course this means more API calls, which can cost money.
Lowering calls on the tree.
Another option in this to create a poor mans knowledge graph. Let’s say the last two context were “bowl” and “peas”. Rather then creating multiple context nodes, you can build a tree which can be passed back to the application layer.
"entity" : "peas->food->care->fish" ... "entity" : "bowl->care->fish"
You can use something like Tinkerpop to create a knowledge graph (IBM Graph in Bluemix is based on this).
Now when a low confidence question is found, you can use “bowl”, “peas” to disambiguate, or use “care” as the common entity to find the answer.
Talk… like.. a… millennial…
One more common form of anaphora that you have to deal with, is how people talk on instant messaging systems. The question is often split across multiple lines.
Normal conversation systems take one entry, and give one response. So this just wreaks their AI head. Because not only do you need to know where the real question stops, but where the next one starts.
One way to approach this is capture the average timing mechanism between each entry of the user. You can do this by passing the timestamps from the client to the backend. The backend can then build an average of how the user talks. This needs to be done at the application layer.
Sadly no samples this time, but it should give you some insights into how context is worked with a conversation system.