I thought I would take a moment to play with Speech to Text and a utility that was released a few months ago.
The Speech to Text Utils allows you to train S2T using your existing conversational system. To give a quick demo, I got my son to ask about buying a puppy.
I set up some quick Python code to print out results:
import json from watson_developer_cloud import SpeechToTextV1 # ctx is Service credentials copied from S2T Service. s2t = SpeechToTextV1( username=ctx.get('username'), password=ctx.get('password') ) def wav(filename, **kwargs): with open(filename,'rb') as wav: response = s2t.recognize(wav, content_type='audio/wav', **kwargs) if len(response['results']) > 0: return response['results'][0]['alternatives'][0]['transcript'] else: return '???';
So testing the audio with the following code:
wav_file = 'p4u-example1.wav' print('Broadband: {}'.format(wav(wav_file))) print('NarrowBand: {}'.format(wav(wav_file,model='en-US_NarrowbandModel')))
Gets these results:
Broadband: can I get a puppy NarrowBand: can I get a puppy
Of course the recording is crystal clear, which is why such a good result. So I added some ambient noises from SoundJay to the background. So now it sounds like it is in a subway.
Running the code above again get’s these results.
Broadband: Greg it appropriate Narrowband: can I get a phone
Ouch!
Utils to the rescue!
So the purpose of asking about a puppy is that I have a sample conversation system that is about buying a dog. Using that conversation file I did the following.
1: Installed Speech to Text Utils.
2: Before you begin you need to set up the connection to your S2T service (using service credentials).
watson-speech-to-text-utils set-credentials
It will walk you through the username and password.
3: Once that was set up, I then tell it to create a customisation.
watson-speech-to-text-utils corpus-from-workspace puppies4you.json
You need to map to a particular model. For testing, I attached it to en-US_NarrowbandModel and en-US_BroadbandModel.
4: Once it was run, I get the ID numbers for the customisations.
watson-speech-to-text-utils customization-list
Once I have the ID’s I try the audio again:
wav_file='p4u-example2.wav' print('Broadband: {}'.format(wav(wav_file,customization_id='beeebd80-2420-11e7-8f1c-176db802f8de',timestamps=True))) print('Narrowband: {}'.format(wav(wav_file,model='en-US_NarrowbandModel',customization_id='a9f80490-241b-11e7-8f1c-176db802f8de')))
This outputs:
Broadband: can I get a puppy Narrowband: can I get a phone
So the broadband now works. Narrowband is likely the quality is too poor to work with. There is also more specialised language models for children done by others to cope with this.
One swallow does not make a summer.
So this is one example, of one phrase. Really for testing, you should test the whole model. From a demonstration from development, it was able to increase a S2T model accuracy from around 50% to over 80%.