A friend and colleague of mine and I were talking late last year about some of the new features Amazon Web Services debuted during the 2016 AWS re:Invent conference. Two specific items that piqued our curiosity were Amazon Polly and Amazon Lex.
Quickly, Amazon Polly is a text-to-speech engine. Give it a string of words, an optional lexicon, and it will return a stream of audio in one of several formats. Amazon Lex, on the other hand, is an engine that not only translates speech to text, but will also attempt to understand the intent of the utterance given by the user.
Given these two breakthrough technologies, there are opportunities abound. This got us thinking: what could we do with these technologies that have not been done before?
We won’t get into those specific ideas, but what I will get into is a proof of concept of what this technology can do.
The Premise
Before leaving for break, I said that I would figure out a way to demonstrate the technology. One day, sitting in front of my computer, I was looking at the weather report (I use the Currently extension in Chrome), and it dawned on me: What about those old “dial a weather report systems from yester-year?”
As Walt Disney would say, “the way to get started is to quit talking and begin doing.” And so I began…
The Pieces and Parts
To relate back to the original discussion, I knew this had to have a telephony interface. Here are the pieces and parts that make this possible:
- Asterisk - Open source PBX platform. I happen to have one in my basement to run my phone system. While not necessary, it was good to have it connected to the PTSN so that others on the outside could test. I use Flowroute as my ITSP.
- I developed against Asterisk 13. Specifically, the important part is the the Asterisk Rest API (ARI).
- Node.js - Server-side Javascript Engine
- Weather Underground API - I needed a backend to get weather forecast information. WU provides a developer limited developer account at no cost to me.
Overall Architecture
There are three major components that comprise the weatherphone application.
- On the far left side is Asterisk.
In the middle, the weatherphone application.
- The weatherphone application itself is a client to Asterisk. It uses a WebSocket connection to maintain an ongoing session with Asterisk
- When weatherphone starts and opens the WebSocket, it registers itself as an application within Asterisk
- The weatherphone app registers with a specific name (aws-polly-weatherphone)
Dialplans in Asterisk can reference the registered name. For example, you might see in a dialplan the following:
1exten => 8000,n,Stasis(aws-polly-weatherphone)The Asterisk application Stasis transfers control of the call to an ARI application (here, weatherphone).
- Weatherphone handles incoming events from the REST API. Depending on the type of event (inbound call, DTMF tone, voice recording, etc.), weatherphone can react in numerous ways.
- Currently weatherphone will listen for DTMF signals from the caller. Once it has 5 digits recorded for a given caller, it will send that zip code to WU for weather data.
On the far right side are both AWS and Weather Underground
- Weatherphone calls Polly on demand to translate text strings to audio output.
- Weatherphone also calls WU in order to get current conditions and forecast data.
Current Status
The code base, which can be found on GitHub, is functional. It’s not perfect, but it does prove that an Asterisk to Polly interface works.
What’s missing is Lex. Why? Amazon Web Services hasn’t given me access to the limited preview. Until then, you have to enter in your zip code via DTMF.
Issues and Limitations
If you want to use Asterisk as your telephony interface, it has no concept of streaming audio. Everything has to be a file. This means you must run weatherphone directly on the Asterisk server, or some sort of shared storage needs to be implemented between the two services. I’ve looked around the Asterisk Wiki and forum, and the developers do not seem to think this is a priority. At best, in Asterisk 14, it is possible to playback a URI. In fact, they are still working on their own text-to-speech engine. Personally, I would rather them focus on being an excellent telephony solution, and not a text-to-speech engine.
Nitpicking a little bit, I did not give a lexicon to Polly to help it render the weather forecast. So if the wind is west-southwest, all you here is “WSW” from Polly. Or that it is “23F” (literally pronouncing the “F”). That should be fairly straightforward to fix.
Obviously the code I wrote isn’t production ready whatsoever. It does what it needs to do to prove a point, but it is buggy and not optimal by any stretch. Proceed at your own peril!
Wrap-up
Overall, this proves that is possible to integrate a cloud-based text-to-speech engine with a popular telephony solution. As soon as I can get access to Lex, I will continue improving this code base so that you can ask weatherphone specific questions and it attempt to give you a relevant answer.
With those pieces together, imagine the possibilities!
Get the code here: https://github.com/seliger/weatherphone |