Speech to Text and Conversation

I thought I would take a moment to play with Speech to Text and a utility that was released a few months ago.

The Speech to Text Utils allows you to train S2T using your existing conversational system. To give a quick demo, I got my son to ask about buying a puppy.

I set up some quick Python code to print out results:

import json
from watson_developer_cloud import SpeechToTextV1

# ctx is Service credentials copied from S2T Service. 

s2t = SpeechToTextV1(
 username=ctx.get('username'),
 password=ctx.get('password')
)

def wav(filename, **kwargs):
  with open(filename,'rb') as wav:
    response = s2t.recognize(wav, content_type='audio/wav', **kwargs)

if len(response['results']) > 0: 
  return response['results'][0]['alternatives'][0]['transcript']
else:
  return '???';

So testing the audio with the following code:

wav_file = 'p4u-example1.wav'
print('Broadband: {}'.format(wav(wav_file)))
print('NarrowBand: {}'.format(wav(wav_file,model='en-US_NarrowbandModel')))

Gets these results:

Broadband: can I get a puppy 
NarrowBand: can I get a puppy

Of course the recording is crystal clear, which is why such a good result. So I added some ambient noises from SoundJay to the background. So now it sounds like it is in a subway.

Running the code above again get’s these results.

Broadband: Greg it appropriate 
Narrowband: can I get a phone

Ouch!

Utils to the rescue!

So the purpose of asking about a puppy is that I have a sample conversation system that is about buying a dog. Using that conversation file I did the following.

1: Installed Speech to Text Utils.

2: Before you begin you need to set up the connection to your S2T service (using service credentials).

watson-speech-to-text-utils set-credentials

It will walk you through the username and password.

3: Once that was set up, I then tell it to create a customisation.

watson-speech-to-text-utils corpus-from-workspace puppies4you.json

You need to map to a particular model. For testing, I attached it to en-US_NarrowbandModel and en-US_BroadbandModel.

4: Once it was run, I get the ID numbers for the customisations.

watson-speech-to-text-utils customization-list

Once I have the ID’s I try the audio again:

wav_file='p4u-example2.wav'
print('Broadband: {}'.format(wav(wav_file,customization_id='beeebd80-2420-11e7-8f1c-176db802f8de',timestamps=True)))
print('Narrowband: {}'.format(wav(wav_file,model='en-US_NarrowbandModel',customization_id='a9f80490-241b-11e7-8f1c-176db802f8de')))

This outputs:

Broadband: can I get a puppy 
Narrowband: can I get a phone

So the broadband now works. Narrowband is likely the quality is too poor to work with. There is also more specialised language models for children done by others to cope with this.

One swallow does not make a summer.

So this is one example, of one phrase. Really for testing, you should test the whole model. From a demonstration from development, it was able to increase a S2T model accuracy from around 50% to over 80%.

 

 

Leave a Reply