Dynamic Toll Data: My First “Real” Alexa Skill

Summary

This project uses a combination of screen-scraping and an API to obtain current travel speeds and the current variable toll costs for Interstate 66 inside the beltway in northern Virginia, and provides the information to users via Amazon’s Alexa voice service. To try it out, first enable it (“Enable sixty six tolls”). From then on, just say “Open sixty six tolls.”

Resources

The project uses Alexa’s voice service. The code is written in Python 3, using the Alexa Skills Kit SDK for Python. The code runs on AWS’ lambda service. It also makes (minimal) use of DynamoDB to store user-specific information. Travel times are scraped from the Virginia DOT’s (VDOT’s) 511 Virginia Traffic Information website. The real-time toll prices are obtained via an API to VDOT’s SmarterRoads data portal. Web scraping and XML parsing was done using Python’s Beautiful Soup library.

The python code as well as the interaction model (a JSON file) are available at https://github.com/ViennaMike/I-66-Tolls

Background

I was looking for a project that would use one of the data sets on the SmarterRoads portal, and I decided that it would be useful to be able to check the dynamic tolls on Interstate 66 inside the Beltway in Northern Virginia. Inbound traffic is tolled between 5:30 and 9:30 am, while outbound traffic is tolled between 3:00 and 7:00 pm.

The tolls are dynamically adjusted to maintain high speeds. While the toll may change between checking the cost at home and the time a driver reaches their entrance, it’s still useful to have an idea of the highly variable toll, especially since tolls for the entire 10 mile length have sometimes spiked at over $40. 

I had previously written a simple quiz skill using Amazon’s template, but this was my first custom skill.

Description

The overall architecture for the Alexa skill is shown below:

High Level Architecture

When a user interacts with the skill, the input is processed according to the interaction model that the developer defines in the Alexa Skill builder. This is captured in a JSON file. The skill builder is also where you tell the skill where to find the execution code for handling requests and ready a skill for certification and distribution.

In the case of 66 Tolls, there are seven custom intents, along with the Alexa built-in intents such as HelpIntent, FallbackIntent, StopIntent, etc. The custom intents are:

  • get_speeds for getting speeds and travel times for the two roughly parallel travel options (I-66 and US-50)
  • get_toll_hours to get static information on the hours that tolls are in effect
  • get_details to get additional static information about how the dynamic tolling system works
  • list_interchanges to get a list of inbound and outbound entrances and exits
  • get_toll to get the current toll price for a specified direction from a given entrance to exit
  • save_trip to save the user’s most frequent entrance and exit in each direction
  • get_favs to report back to the user what trips he has previously saved.

When the skill is opened by a user who has previously saved a trip, the skill immediately returns the appropriate current inbound toll if it’s in the morning, or outbound toll if it’s in the evening or afternoon.

The Alexa Skills Kit SDK contains built-in functions to simplify interacting with Amazon’s DynamoDB NoSQL database. This skill uses a simple DynamoDB table to store the user_id (the key), most frequent inbound entrance and exit, and most frequent outbound entrance and exit.

By far the easiest part of the project was the code to get the travel times and tolls from the two VDOT sources. There is an API for the tolling data, and I had to do some simple web-scraping to get the travel time data. This code can be found in the get_travel_times() and get_tolls() functions in the code.

Developing the voice interaction model took several iterations, and I found that as time went on, I was able to improve the dialog model while reducing the number of intents and number of slot types associated with each intent. It definitely pays off to spend time thinking of how users will interact with your skill.

Because this was new to me, it took considerable time, as well as trial and error, to figure out how to write the handler code, and especially how to handle the session and persistent attributes and the interaction with DynamoDB. I used a large number of resources, with the best being the Skill SDK documentation, the Color Picker sample app, and A Beginner’s Guide to the New AWS Python SDK for Alexa, by Ralu Bolovan. As described in the documentation, the python SDK supports two coding models, one based on functions with decorators, and the other based on classes. I chose to use classes, but the Color Picker example uses decorated functions.

Some of the hassles I had came from two factors: 1) the interface to Alexa skills has changed over time. It’s been improving, but this also meant that some of the samples and tutorials on the web are out of date. 2) While there is thorough documentation, a lot of tutorials and examples focus on simple demonstrations. For this reason, it may have been better for to step back and read more of the SDK rather than always jumping in. For example, I needed to have my code do something every time an intent was called, regardless of what the intent was. It turns out this is handled by Request Interceptors and Response Interceptors, which most of the simple samples leave out. This, along with the thorough walk-through on using DynamoDB, is why I found the Beginner’s Guide to the New AWS Python SDK for Alexa to be so helpful.

I originally wanted the invocation for the skill to be “i sixty six tolls,”, but I found that Alexa had trouble recognizing this as the invocation. For this reason, I had to make the invocation “sixty six tolls” rather than “i. sixty six tolls.”

I also found that if you use Alexa’s built in “confirm” capability, then when your code is called for the first time, the handler_input.request_envelope.session.new is set to False, apparently because the built-in confirmation request kicked off the session. That’s something to be careful of. For this and other reasons, I ended up checking whether or not I had previously initialized the session attributes, rather than whether or not the session was new.

The last bug I fixed was that I hadn’t thought about what “local time” was to the server. I have been naively thinking that since I was using AWS’s Northern Virginia server, the local time would be the US eastern time zone, but all Lambda servers use GMT as their local time, which makes a lot more sense. So I used the pytz library to convert to local time.

I hope that this example will prove helpful for others who want to write their own custom Alexa skills.

Some Christmas Entertainment from Yorick


A little bit from Yorick for the holidays. As I mentioned in a previous post, Yorick now responds to “Alexa” using Amazon’s Alexa voice, and responds to “Yorick” using a voice with audio effects added to better match his appearance. I used both in this edited sequence.

Whatever you celebrate this time of year, a happy holiday greeting to one and all.

Yorick 2.0: The Personality Split

Introduction

When Yorick was first brought to life, he had Alexa’s voice. A lot of his charm was the incongruity between his appearance and his voice. At the same time, a number of folks asked about having a creepier voice and I wanted to try to do that for this Halloween.¬† An update to the AlexaPi project¬†added support for the SoX audio playback handler as an alternative to VLC. SoX has support for audio effects, so it became possible to change Yorick’s output voice. I didn’t want to lose Alexa’s voice, so I edited the AlexaPi code so that it would recognize both “Alexa” and “Yorick” as trigger words, with the output sound depending on which trigger word you used. As a result, Yorick now responds either as Alexa or with his own voice.

Just like Elliot on Mr. Robot, Yorick now has a split personality.

Conversations

I talked to Yorick, aka Alexa, a bit about Halloween:

It turns out that Yorick is a baseball fan and was rather disappointed that the Washington Nationals aren’t in the World Series. Awhile back, I asked him about going to one of the playoff games:

Technical Notes

AlexaPi uses PocketSphinx for recognizing the trigger word. The original code is set up to recognize a single trigger word or phrase, which you can easilly change in a yaml configuration file. However PocketSphinx can recognize multiple keywords or phrases selected from a python list. Some editing of the AlexaPi source code was needed in order to change the trigger from a single variable to a list. Similarly, the code was modified slightly so that once a trigger word was recognized it checks which word was used. If the trigger word is “Yorick” it changes the pitch and speed of the audio output.

I used version 1.5 of AlexaPi. This and previous versions had a problem in that the temporary file names used were the response code that the Alexa voice service returned. These sometimes included characters that were illegal for file names or that were too long for a file name. I patched these problems (and later versions of AlexaPi have fixed this problem).

In addition, the servo motion routines had to be modified slightly, as version 1.5 and later of AlexaPi begins streaming questions to the Alexa voice service as they are asked, rather than waiting until the question is finished. This results in a faster response time.