I have a dream…

Following on from Speech to Text, let’s jump over to Text to Speech. Similar to conversation, what can make or break the system is the tone and personality you build into the system.

Developers tend to think about the coding, and not the user experience so much.

To give an example, let’s take a piece of a very famous speech from MLK. Small sample so it doesn’t take all day:

I still have a dream. It is a dream deeply rooted in the American dream.

I have a dream that one day this nation will rise up and live out the true meaning of its creed: “We hold these truths to be self-evident, that all men are created equal.”

Let’s listen to Watson as it directly translates.

It sounds like how I act when I am reading a script. 🙂

Now lets listen to MLK.

You can feel the emotion behind it. The pauses and emphasis adds more meaning to it. Thankfully Watson supports SSML, which allows you to mimic the speech.

For this example I only used two tags. The first was <parsody> which allows Watson to have the same speaking speed as MLK. The other tag was <break> which allows me to make those dramatic pauses.

Using Audacity I was able to put the generated speech against the MLK speech. Then selecting the pause areas, I can quickly see the pause lengths.

audicity

I finally ended up with this:

Audacity also allows you to overlay audio, to get a feel to how it would sound if there were crowds listening.

The final script ends up like this:

<prosody rate="x-slow">I still have a dream.</prosody>
<break time="1660ms"></break>
<prosody rate="slow">It is a dream deeply rooted in the American dream.</prosody>
<break time="500ms"></break>
<prosody rate="slow">I have a dream</prosody>
<break time="1490ms"></break>
<prosody rate="x-slow">that one day</prosody>
<break time="1480ms"></break>
<prosody rate="slow">this nation <prosody rate="x-slow">will </prosody>ryeyes up</prosody>
<break time="1798ms"></break>
<prosody rate="slow">and live out the true meaning of its creed:</prosody>
<break time="362ms"></break>
<prosody rate="slow">"We hold these truths to be self-evident,</prosody>
<break time="594ms"></break>
<prosody rate="slow">that all men are created equal."</prosody>

I have zipped up all the files for download, just in case you are having issues running the audio.

IHaveADream.zip

In closing, if you plan to build a conversational system that speaks to the end user, you also need skills in talking to people, just not being able to write.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s