17 min read

Sipping tea with Alexa

The goal for this simple project is to be able to ask Amazon's Alexa digital assistance to run a timer for the type of tea, where the time is predetermined based on that tea type.
Sipping tea with Alexa
Photo by Mark Farías / Unsplash

Our goal

The amount of time a specific tea steeps, and at what temperature water makes a big difference in the resulting flavor palate. This is a topic I have covered extensively on Tea-Guy.com.

The goal for this simple project is to be able to ask Amazon's Alexa digital assistance to run a timer for the type of tea, where the time is predetermined based on that tea type.

For the project, we'll evolve the skill from something very simple where the skill accepts a single command and invokes a timer assuming a generic black tea as the type. The end goal will include deeper logic surrounding synonyms of terms for different kinds of teas and tisanes for more user-friendly handling.

TL;DR

In this tutorial, we create an all-new Alexa skill to run special timers based on the kind of tea uttered by the user. You can find the code examples for this project using the following links to my GitHub profile, repeated at the end of the tutorial as well.

  • Blank initial project downloaded from the Alexa developer console
  • The completed and operational Alexa skill, as created in this tutorial
🔄
It's worth noting that you could take this skill and get significantly more complex than our final product, leveraging AWS or other cloud resources for hosting custom APIs or data sources and having Alexa make requests and parse responses, but that's out of scope here.

Prerequisites

  • A free Amazon Alexa developer account
  • An understanding of Node.js, JavaScript, and JSON
  • An understanding of Git and source control tools
  • A text editor or IDE for writing Node.js, Javascript, and JSON code (I recommend VS Code, also free)
  • A device with Alexa enabled using the same Amazon account as your Alexa developer account
⚠️
Note that you can also create skills for Alexa using the Python programming language, but I have chosen to use Node.js

How to read this tutorial

Code will be visible in markdown blocks like the one below. These code examples will evolve as we proceed.

{
    "key": "value"
}
⚠️
Also note that comments are included in JSON examples to help understand the content, but comments are not a supported JSON entity type, and a direct copy/paste of the code will cause issues.

Reference the Github project links at the bottom of this tutorial for both starting and completed code.

Specific notations will be made throughout using colored panels for clarity. An example of such a panel is the following:

🍵
Mmm, tasty matcha!

Agenda

We'll walk through the following steps as part of this tutorial.

  1. Create the new Alexa skill in the developer dashboard
  2. Update the invocation term (how Alexa knows to use the skill)
  3. Activate the Timers permission for the skill
  4. Get settled in VS Code
  5. Review & update the interaction model file
  6. Update the NPM package.json file
  7. Review handlers for default intents
  8. Create a handler for the custom intent to brew tea
  9. Save, build, deploy, and test the project
  10. Let's get fancy (adding slots and synonyms)
  11. Download project files

Let's do this

Create a new Alexa skill

If you're new to the Alexa developer console, there are a variety of options here, but we're going to keep things simple.

👉
The following steps can all be completed using the Alexa Skills Toolkit (ASK) on a command line for any relative OS.

There are solid extensions for VS Code created by the Alexa Skills team at Amazon, hosted on Github. It's out of scope to cover these here, but I will assume their use later on when we transition to writing the JavaScript code for our skill.
A screenshot of the configuration screen for adding a new Alexa skill, with highlights for the name field, custom model type, and Alexa-hosted Node.js backend
Screenshot of the Skill name configuration screen in the Alexa developer console
  1. Click "Create Skill" from the dashboard
  2. Enter a name for the skill (I've chosen "Tea Timer")
  3. Choose a "Custom" model
  4. Choose "Alexa hosted (Node.js)" for the backend
  5. Click "Create Skill" in the upper right
A list of UI cards for base Alexa skill templates
Screenshot of the Alexa skill template choices
  1. On the resulting screen choose "Start from scratch"
  2. Click "Continue with template" in the upper right
⚠️
The provisioning of the new skill may take a few minutes, feel free to grab some tea.

These steps will generate a stub of the new Alexa skill and drop you into the skill's Build configuration screen. There's nothing really here yet, but you'll likely see a notification that initial build and deploy activities have been successful.

The skill home screen after the skill has been provisioned
Screenshot of the custom skill builder checklist screen in the Alexa developer console

Change the invocation name

The invocation name is the phrase a user will speak (utter) in order to engage the skill through their Alexa devices.

The default provided for a new custom skill is simply "change me". But this will not suit our purposes, and we should change it. I recommend "tea time", but you can choose any phrase you like (e.g. "beam me up" or "now steeping").

To change this value, open the "Invocations" navigation under Custom to the left. Then select "Skill Invocation Name" and enter your new invocation name.

Finally, Alexa won't recognize this change until you save the model and build it.

  1. Click "Save Model" above the field you edited
  2. Once the save is confirmed, click "Build Model" a couple of buttons to the right
An image of the Alexa skill invocation name screen
Screenshot of the Invocation name screen in the Alexa Skill developer console

Activate Timer Permissions

It's an interesting design choice on Amazon's part, I have no doubt it's for security and user experience reasons. However, an Alexa skill needs to request one-time permission from the user to allow the skill to create and manage timers on their behalf. As the developer, we need to tell Alexa that this skill needs to seek permission from the user.

To do this, select "Permissions" under the "Tools" heading at the bottom of the left navigation.

The resulting screen has a variety of permission toggles. Toggle "Timers" to active at the very bottom of the list.

⚠️
There's no need to save again after this change, the system handles this on its own, though we need special handling in the skill's code later to ask the user for permission directly if they've not yet authorized the skill for this activity.

Time to get dirty with some code

🧠
From here on out we will be switching between VS Code and the Alexa developer console.

You can do most of what you need using just VS Code and the Alexa Skills Toolkit if you like, but there's no effective way to test timers using the CLI or web console, so you'll need an Alexa-enabled device for testing when it's time to do so.

Clone the Alexa skill project

⚠️
We will be using the Alexa Skills Kit extensions for VS Code to clone our project, make changes, commits, and push code back up.

Please reference the GitHub repo for those extensions for a walkthrough on installing, configuring, and logging in to your Alexa developer account so you can follow along.

Note. You can certainly copy/paste your code into the Alexa developer console, or even directly make your code edits in the console, though this is not as developer-friendly, or a good practice (in my opinion).

Once installed and configured, to use the Alexa Skills Kit extension for VS Code, and assuming you're logged in with a profile configured (note: requires a git install), click the Alexa Skills Kit extension in the left navigation of the VS Code UI.

Screenshot of VS Code with the Alexa Skills Kit extension selected at the left
Screenshot of the Alexa Skills Kit in VS Code

From here, a secondary area loads where you might normally see a file browser or project view, depending on your setup.

There are a few options here with the ASK extension:

  • Create a skill
  • Download and edit skill
  • Open local skill

For this exercise, since we already created the skill in the developer console, you can choose "Download and edit skill". This will present a popover list of any skills you have tied to the Alexa developer account you logged in with. If you have many, the list can be filtered by name by typing in some text for the filter to match.

A screenshot of selecting the Tea Timer skill from the list of available skills
Screenshot of the Tea Timer project appearing in the Alexa Skills Kit UI within VS Code

Select the new skill you created earlier in this exercise, select a place for the files to be copied locally, and watch as the ASK extension automatically clones the project code and details from the cloud to your local system.

A screenshot of the Tea Timer resources copied locally
Screenshot of the imported Tea Timer project in VS Code

Examining the skill's initial contents

When first cloned, the skill is very basic. It has no "wiring" to connect anything together yet, we will get to that. But, assuming you're new to the Alexa skill community, it's maybe helpful to cover some specific elements.

Every Alexa skill has a set of regions it can be deployed to. You can set these later in a project, but you likely selected a default region for deployment when first creating your Alexa developer account. For example, I deploy to US East - Virginia.

  • The lambda directory will hold our code for AWS lambda functions. We'll focus mostly on index.js
  • The skill-package directory holds central configuration options for the skill itself. This includes the skill.json file which handles general skill description details for the packaged skill and how it appears in the Alexa skill market. It also holds the directory and files for interactionModels, which we will define later. Interaction models are linguistically bound, meaning the user's spoken language needs to be localized. We will focus on only US English (en-US.json).
  • The .gitignore file is used to ignore certain files from the filesystem. We will be updating this right away.

Update the .gitignore file

The default .gitignore file looks like this:

ask-resources.json
.ask
.vscode
lambda/node_modules

We need to extend this by adding the following lines to the bottom:

node_modules/
hooks/
_dev/

Together these should cover our needs for this project. The Alexa environment simply doesn't need these extra files and details clogging everything up.

Make your changes, and save the file. You should see an M to the right of the file name in the list, indicating the file has been modified.

A screenshot showing the .gitignore file has been updated
screenshot of the .gitignore file in VS Code's file list

Go ahead and stage this change before we proceed with other alterations.

To stage your change, click on the "Source Control" tab at the left in VS Code. From there you can simply click on the (+) icon to the right of the file name, or you can right-click for a context menu and choose "Stage Changes".

A screenshot showing the options for staging changes to files
Screenshot of the Stage Changes option in the VS Code context menu for the .gitignore file
⚠️
Staging your changes doesn't commit them, but it does mark them nicely so when you go to commit later you can see what work has been batched up and what hasn't. I tend to stage my changes as soon as I am happy with the local results of a given update. I'll commit them later.

Updating the skill interaction model

Next, let's take a look at the interactionModel. The file for this should be named as en-US.json if your project is set to US English by default. The file itself is located in skill-package/interactionModels/custom. We will reference this file frequently later, but we'll make some basic edits now.

⚠️
The following code examples include comments, which are not supported in JSON files. Be sure to strip the // lines from the code if you go to copy/paste from the code examples.

Any new skill has a series of default intents added. While you could technically remove them, it's bad form to do so. Although we will update the AMAZON.FallbackIntent later, we will be leaving the following intents alone:

  • AMAZON.CancelIntent
  • AMAZON.HelpIntent
  • AMAZON.StopIntent
  • AMAZON.NavigateHomeIntent

Our primary focus will be on replacing the HelloWorldIntent included by default in this custom skill project.

But first, let's have a quick review of the structure of this file.

⚠️
Apologies for my blog not currently supporting syntax highlighting, I've tried to break the content up a bit to make it more readable, but I recognize it's not as friendly as I'd like it to be.
{
  // The interation model is at the top level of any skill and defines
  // the breakdown of the skill from there
    
  "interactionModel": {
      
    "languageModel": {
      // The invocation name is something we updated in the web UI, 
      // and may not be reflected in the file when you go to pull it down
      // it is okay to alter the invocation name here as well
        
      "invocationName": "tea time",
      
      // Intents are the core of most conversational systems using natual
      // language understanding technologies. In short, the goal is to 
      // understand the intent of a user and perform actions (fulfillment)
      // of the user's request based on the intent
        
      "intents": [
        {
          // A default intent for a user to cancel their requested action
          
          "name": "AMAZON.CancelIntent",
          "samples": []
        },
        {
          // A default intent for a user to get help with the skill
          
          "name": "AMAZON.HelpIntent",
          "samples": []
        },
        {
          // A default intent for the user to stop the skill's execution
          
          "name": "AMAZON.StopIntent",
          "samples": []
        },
        {
          // This is a custom intent, note that it isn't prefixed with AMAZON
          // and we see some sample "utterances" for the intent helping Alexa
          // to understand what a user may say to trigger the intent
        
          "name": "HelloWorldIntent",
          "slots": [],
          "samples": [
            "hello",
            "how are you",
            "say hi world",
            "say hi",
            "hi",
            "say hello world",
            "say hello"
          ]
        },
        {
          // A default intent allowing the user to return to the skill's
          // default prompt
          
          "name": "AMAZON.NavigateHomeIntent",
          "samples": []
        },
        {
          // A default intent to catch all user input which doesn't seem
          // to match a defined intent for the given skill, if say a user
          // asks to purchase more tea, but our tea timer doesn't provide
          // purchase functionality
          
          "name": "AMAZON.FallbackIntent",
          "samples": []
        }
      ],
      // Types allow for custom definition of selectable options, and
      // for more complex conversations leveraging "slots" and "synonyms"
      // which will be covered later in this tutorial
      
      "types": []
    }
  },
  // Version defines the version of the skill
  
  "version": "1"
}

Alright, now that we've taken a look at the file's structure, and have somewhat of a better understanding of what each pre-existing element does, we can start to poke around and make some changes.

Here our intent is listed as HelloWorldIntent, let's go ahead and replace that with something more meaningful for our Tea Timer skill. I have chosen to use BrewTeaIntent, which I think is a bit on the nose, but nicely lacks ambiguity.

The samples also do not match our needs, so we can update those now, as well. I have provided my configuration below, but you may use your own or add on to what I've provided. Regardless, the samples should match a pattern where the user would utter a phrase like so:

Alexa, tea time, brew tea
{
  "interactionModel": {
    "languageModel": {
      "invocationName": "tea time",
      "intents": [
        {
          "name": "AMAZON.CancelIntent",
          "samples": []
        },
        {
          "name": "AMAZON.HelpIntent",
          "samples": []
        },
        {
          "name": "AMAZON.StopIntent",
          "samples": []
        },
        {
          "name": "BrewTeaIntent",
          "slots": [],
          "samples": [
            "brew",
            "brew tea"
          ]
        },
        {
          "name": "AMAZON.NavigateHomeIntent",
          "samples": []
        },
        {
          "name": "AMAZON.FallbackIntent",
          "samples": []
        }
      ],
      "types": []
    }
  },
  "version": "1"
}

Go ahead and save your changes, and stage them.

That's it for our interactionModel for the moment. We will make changes again later when we come back to extend the skill. But we can move on for now.

What cute packaging

Next, let's take a peek at package.json located in the lamda folder.

This file is nice and succinct and is a simple NPM package file.

Let's just update the package details to replace the hello-world entry with our own. My finished copy is below. We should not need to revisit this later.

{
  "name": "tea-timer",
  "version": "1.0.0",
  "description": "Alexa utility for brewing better tea using timers.",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Amazon Alexa",
  "license": "Apache License",
  "dependencies": {
    "ask-sdk-core": "^2.7.0",
    "ask-sdk-model": "^1.19.0",
    "aws-sdk": "^2.326.0"
  }
}

Save and stage your changes to this file.

⚠️
Note that your Alexa license section should remain untouched, and may contain newer version details. You probably don't want to forcibly downgrade the packages.

Going on the lambda

The real "meat" of our Alexa skill lies in the index.js file. This is where the logic and details live, including how to manage speaking text to the user and to receive and process input from the user.

🧠
While skills can be made of a variety of different scripts, index.js is typically the starting point for the skill. You can override the default skill script using the package.json file's "main" key/value pair.

Let's take a moment to break down and understand the structure of a skill script.

  1. Predefined "require" calls effectively define dependencies
  2. The handlers are functions that define how to manage request, response, and fulfillment for a given intent, whether that intent is one Amazon provides, or one you have built.
  3. The LaunchRequestHandler is responsible for the skill's initial response to invocation. Basically when a user says "Alexa, tea time", how should the skill respond? This handler is the single pathway into the skill.
  4. The default file has predefined handlers for default intents we noticed from the en-US.json file earlier. You can see, and customize the skill's responses using these handlers, giving flexibility for the personality the skill takes with the user.
  5. The exports section exposes the handlers for use. You can have functions called purely within the script and not have them exported.

Before we just start digging in, I've found that it's best to sort of outline what we need to do as a way to structure our actions and ensure we don't miss anything notable.

Here's our agenda for this script:

  1. Understand core functions of Alexa skills
  2. Add more dependencies
  3. Add support for calling Alexa's timers API
  4. Define the timer's defaults (time, timer name, what to see when the timer is done, etc)
  5. Customize & update the launch handler
  6. Review and update our spoken phrases for other handlers
  7. Review exports & ensure everything we need is there

Core Alexa skill functions

Okay, so now that we have kind of a task list, let's look at some definitions for core functions for an Alexa skill.

The following list of functions really represents the primary tools for building an Alexa skill. This is true regardless of whether you're building a simple skill or a super complex skill with dozens of nested and/or interfacing intents.

FunctionDescription
speak()Alexa will say the provided text out loud
reprompt()Alexa will say the provided text out loud if there's no response to a previous ask for input from the user
withSimpleCard()For Alexa devices with a screen - provides a card title and descriptive text in addition to the spoken output.
getResponse()Listens for a user response
getRequestType()A handler definition asking Alexa to match an intent for this function
getIntentName()A handler definition asking Alexa to match specific intents to this function
handlerInput.responseBuilderRespond to user input

There are certainly more functions we could call, but these are the foundational building blocks leveraged by any Alexa skill to interface with the user.

Dependencies

We really only need a couple of additional dependencies vs the defaults provided. It's worth noting that these dependencies get resolved by NPM, so you have the flexibility to leverage any NPM libraries you're already familiar with.

We want to add support for Axios, for HTTP API calls. And also for Moment which is used for parsing and manipulating dates and times.

The top of your index.js should result in something like the following:

/* *
 * This sample demonstrates handling intents from an Alexa skill using the Alexa Skills Kit SDK (v2).
 * Please visit https://alexa.design/cookbook for additional examples on implementing slots, dialog management,
 * session persistence, api calls, and more.
 * */
const Alexa = require('ask-sdk-core');
const axios = require('axios');
const moment = require("moment");

⚠️
Note that this is a JavaScript file, not a JSON file. And you can leave the comment blocks in place.

Defining the timer

Before we get to invoking it, let's make sure we have constructed our basic timer details. This should happen immediately after our dependency definitions within the index.js file.

We first need to construct our timer, let's call it brewTimer. With this, we'll need to provide some attributes as required by Alexa's timers framework. This will include the duration of our timer (let's hard code it for 4 minutes), the label/name of the timer, some basic display experience details, and some triggering behavior definitions. Whew, it sounds like a lot, but it's not so bad when we take a look at the results.

const brewTimer = {
    "duration": "PT4M",
    "timerLabel": "Tea Timer",
    "creationBehavior": {
        /* Let's just hide this from displays for now */
        "displayExperience": {
            "visibility": "HIDDEN"
        }
    },
    /* Now we can define what happens when the timer triggers (time expires) */
    "triggeringBehavior": {
        "operation": {
            "type": "ANNOUNCE",
            "textToAnnounce": [
                {
                    /* You can have the timer say different things for different localizations */
                    "locale": "en-US",
                    "text": "Your tea is ready!"
                }
            ]
        },
        /* Should we sound an alarm as well? */
        "notificationConfig": {
            "playAudible": true
        }
    }
};

Our skill still isn't able to really do anything yet. The out-of-the-box HelloWorldHandler needs to be replaced, and we need to rewire some stuff, add API integration, and more.

Let's start with adding our API handling. There's no simple "Alexa, run a timer" function. So we instead leverage the timers API for Alexa. There are a couple of mechanisms for timers depending on how a timer is tied.

We could use the generic timer API, or we could use the Cooking Timer API. However, because the Cooking Timer API requires additional Smart Home dependencies, we'll just use the generic timer API.

To make this easy on the user, we're going to construct a handler for an ASK offered function called a YesNoIntentHandler. This is a combined handler, capable of mapping both a yes intent, and a no intent to one function, and we'll choose what to do based on the intent provided. We could do this in separate methods, but it's usually worthwhile to combine them since a no is often less code this way.

The following is my code for this. Note the references to the brewTimer. Also, note the structure of a handler matching one or more intents.

This block of code can be put anywhere AFTER the LaunchRequestHandler. I put all my custom functions and handlers after all the default Alexa-provided ones. This makes scrolling to the exports section a shorter trip when I need to update it or check if I missed something.

const YesNoIntentHandler = {
    canHandle(handlerInput) {
        return Alexa.getRequestType(handlerInput.requestEnvelope) === 'IntentRequest'
            && (Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.YesIntent'
                || Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.NoIntent');
    },
    async handle(handlerInput) {

        // User intent is a 'yes'
        if (Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.YesIntent') {

            /* Here we use the Moment dependency for handling time */
            const duration = moment.duration(brewTimer.duration),
                hours = (duration.hours() > 0) ? `${duration.hours()} ${(duration.hours() === 1) ? "hour" : "hours"},` : "",
                minutes = (duration.minutes() > 0) ? `${duration.minutes()} ${(duration.minutes() === 1) ? "minute" : "minutes"} ` : "",
                seconds = (duration.seconds() > 0) ? `${duration.seconds()} ${(duration.seconds() === 1) ? "second" : "seconds"}` : "";

            const options = {
                headers: {
                    "Authorization": `Bearer ${Alexa.getApiAccessToken(handlerInput.requestEnvelope)}`,
                    "Content-Type": "application/json"
                }
            };

            /* Here we call the Alexa timers API using the axios dependency */
            await axios.post('https://api.amazonalexa.com/v1/alerts/timers', brewTimer, options)
                .then(response => {
                    handlerInput.responseBuilder
                        .speak(`Your ${brewTimer.timerLabel} timer is set for ${hours} ${minutes} ${seconds}.`);
                })
                .catch(error => {
                    const speakOutput = `Sorry, I had trouble starting that timer for you.`;
                    console.log(error);
                });
        }

        // User intent is a 'no'
        if (Alexa.getIntentName(handlerInput.requestEnvelope) === 'AMAZON.NoIntent') {
            handlerInput.responseBuilder
                /* The user said no, so we don't need to getResponse, we'll just close out */
                .speak('Alright I won\'t start a timer.');
        }

        return handlerInput.responseBuilder
            .getResponse();
    }
};

In the above block, we check to ensure intents are either a YesIntent or a NoIntent from Alexa, then if it's a yes we process the time configured using Moment, followed by constructing an API call to Alexa's timers API using a bearer token.

As you can see, the NoIntent handler is very simple. They said no, we formulated a verbal confirmation for Alexa to speak, and we let go.

We'll look at how we construct the bearer token in the next step, as we customize the LaunchRequestHandler.

Launch request handler

#TODO

Contextualize other statements

Other handler methods that are saying things to the user should have those statements contextualized. It feels awfully generic to say something like "I don't know about that" instead of "I don't think that's a tea timer thing, try another request."

Try updating the phrases spoken by Alexa for the following handlers to give the statements or requests context to our tea timer intents.

  • HelpIntentHandler
  • CancelAndStopIntentHandler
  • FallbackIntentHandler
  • SessionEndedRequestHandler
  • IntentReflectionHandler