Making your first Capsule for Samsung Bixby – An Exercise in Teaching your Phone to Listen

 

Today Samsung made the Bixby Developer Studio available for download and use so that developers can start building capsules to publish on their marketplace starting in 2019. I am an early adopter of the new Bixby and wanted to share how to build a simple capsule using the new developer kit as well as share my experience with using the new platform. Readers of this blog know that I have made and published Skills for Amazon’s Alexa and published tutorials on how you can develop for that platform. Similarly this writeup will focus on a minimal application that will help you get started with the features of Bixby’s impressive development tools.

Bixby is a wonderful platform to develop for and it has top-notch development tools. Building software for Bixby is a lot like teaching someone a new skill. It uses natural language model training so you can show Bixby what parts of user phrases are important and it uses the idea of concepts to define Bixby’s understanding of what capability you are giving it. These concepts will be discussed in detail below.

Today we will be making a capsule to generate passwords made up of a random string of words, inspired by XKCD’s Password Strength comic. We will be taking advantage of Bixby’s visual interface to make a password users can easily remember as well as easily copy and use for their accounts. The XKCD algorithm sticks random, memorable words together so that passwords are complex but can also be easily recalled by the user. They also are higher entropy than a random string of characters and numbers, making them harder to crack my many brute-force methods. After giving the comic a quick look, read on.

As always, you can download and view the code on my GitHub!

THE PROBLEM

We want to build a Bixby capsule that can generate memorable passwords for users. These passwords should be of a user-specified length.

The overall requirements are:

  • Generates a password using regular English words
  • Takes in a user’s specified length
  • Displays the password graphically for copying
  • Displays a calculation of the entropy of the password so the user knows how good the password is.

THE SOLUTION

You should now have Bixby Studio installed on your system. As of writing it is available for Windows and macOS.

Create a new project by clicking File>New Capsule.

The first bit of code we will focus on is the generator.js file. This is where we define our entry point and what we are going to return.

Notice how we export the function generate- this is the function where I generate everything we need for the response you see in the screenshot above. We take our wordlist dictionary file (how to get that will be discussed shortly), we build our password using a user-specified length called numWords, and we calculate the entropy of the password. We then return a result we can parse into a nice, visual response like in the screenshot.

Whew, there’s a lot going on in here though! Let’s start with the wordlist. This is a JSON-formatted list of common English words I found searching for open-source corpora. Why JSON? As you might have surmised from the above code snippet, Bixby capsules are written in JavaScript! Importing this data as JSON makes it very easy to loop through and use, as you can see in my generate function. I stored this in a directory called lib but you can call it whatever you please. Just be sure to update the path in the generator!

Next, we need to discuss how we get numWords. This is the user-input. We want the user to say ‘Make me a password with three words’ and Bixby needs to know how to do that.

In the resources directory you will find endpoints.bxb. The actions your capsule can take are called endpoints. Let’s define one for generating a password:

Let’s look at what we have here: We have authorization set to none because this endpoint is public and available to any user without authorization. We have specified an action endpoint for our generate function as defined in the generator.js snippet above and we have told Bixby that the input for this endpoint is numWords. We also tell it what file it will find the definition for this endpoint in- generator.js.

Now that Bixby has an available action in the form of the endpoint, we get to the really interesting stuff- teaching Bixby what everything in our capsule means. The way we do this is via a model. In the model directory we have actions and concepts. These make up Bixby’s understanding of what your capsule can do, and we just need to write some high-level markup to make this work. Let’s start with the action our capsule is going to have- generating passwords. This will inform what concepts Bixby needs to have definitions for so that we can move on to training our natural language model.

Above is the generator.model.bxb action file. You will find it in my action directory. What does this do? Read through the comments carefully. It defines the actions Bixby will take when running this capsule, and it covers all our bases regarding various user inputs! We tell it our action is to run our generate function. We tell it to collect numWords, and we tell it that numWords is of the numWords concept type which we will define shortly. We tell Bixby that there can be at most one numWords (so that we ignore other numbers in the user’s invocation) and we tell Bixby that this value is required. If Bixby cannot find a number in the invocation to use, we define a default initialization with four words- the same as our XKCD comic! We then do some validation in the event we find a number in the user’s invocation. If numWords is 0 or less, we want to display some text telling the user that you cannot have a password that is negative in length (duh, but the bulk of software development is anticipating stupid). Finally we tell Bixby what our result is going to be- an instance of our PasswordResult concept, which will be of the type Calculation. This is a type Bixby provides for a result that it needs to compute or otherwise derive. Let’s get started defining what these concepts are.

If you are following along in the repo, look at the numWords concept.

This is a good minimal example of a Bixby concept. These are the variables that are key to our capsule working. You can think of them as teaching Bixby a new idea, slowly building for it the picture of what you are trying to achieve. We tell Bixby that NumWords is an integer (we don’t want fractional words). We also give a brief description of what this has to do with our capsule. For NumWords this is obvious- it is the number of words in the password.

Password is almost the same except this concept is given the ‘name’ type since we need an output string. We describe it as the output password. Entropy is similar- we describe it as the approximate bits of ‘randomness’ in our password and give it the integer type since it will be a number we calculate. Length, predictably, is an integer that represents the length in words for our password. This is utilized in the entropy calculation, which taking the formula from the comic is taking two to the power of the number of words and then dividing for the number of attempts to brute force the password you could make if your computer could make 1000 attempts per second for a year. This yields an estimation of the number of years the password would take to crack in these circumstances. Finally Years is given the integer type and described as the number of years simple brute forcing would take to crack this password- it is also part of the entropy calculation we display at the bottom of our result as you can see in the above screenshot.

The most complicate concept is our PasswordResult:

It has the type Structure because it contains multiple properties- namely every concept we have just defined. We give these properties types- I just made these the same as the property name for simplicity but they can be used in more complicated capsules to link properties together with a descriptive type. We again describe each property and what it does, tell Bixby if the property is required, and for each tell Bixby that there can be at most one value for each. This result, as you may recall from the generate method, is what we will use to generate our visual response on the screen of the device. We have now explicitly told Bixby everything there is to know about how our capsule is going to work! It knows every concept and every result we are going to want. We now can teach Bixby how to handle speech.

Click training in the resources/en directory.

Screen Shot 2018-11-07 at 7.11.44 PM

You will see a list of training examples I have provided the natural language model. We are effectively training Bixby to understand how to parse user phrases and turn them into useful input for our capsule. This is an application of machine learning! Notice the examples I have provided. I have made one: ‘generate a password for me’ with no numbers in it- this is to provide an example where Bixby should use our default input of four words from above, like the XKCD comic. I also provide numerous examples with varying numbers of words asking Bixby to generate a password in various ways. Notice how I have clicked on and highlighted the number in each training phrase and I have labeled this value as numWords! You will do this for each input your capsule needs- the more examples the better. Bixby will use the labels and examples you provide to teach itself that when something sounds similar to your examples Bixby is being asked to open your capsule and feed the data that is similar to the labeled phrases you gave it to the capsule as input. Bixby is learning, so make sure to spend plenty of time here to make sure Bixby really gets it! Compiling the model will make Bixby learn each of your examples and you can view what Bixby’s output for your examples would be so you can be sure that Bixby has not mis-learned how to handle your examples. A well-trained model will make your users happier and your capsule easier to use. This is my favorite part of the Bixby developer tools- it is very intuitive and fun to use, and it offers a look for machine learning enthusiasts into the underlying technologies behind Bixby. This is a defining attribute of the platform for me- it feels much more flexible than Alexa, which as a developer seems to encourage a more robotic and specific interface for its skills than the more flexible Bixby interfaces for capsules.

With your model trained and your concepts laid out, the last thing to do is to specify how Bixby should display our output. This is done with dialogs and layouts.

Dialogs define for Bixby’s interface the concepts (inputs) and the results. Therefore for each input you need there will be a dialog and for each result there will be a dialog.

NumWords therefore gets a dialog like so:

This is pretty bare-bones: We define a concept dialog (input dialog), tell it to look for NumWords (like in our training!) and we provide some template text for this type if we wanted to display something related to this input (in my project I ended up not using it).

The Password Result Dialog defines the dialog for our result. This one is more important for this project as it will populate our layout.

We define an output (result) dialog, have it match this time for our PasswordResult concept (passing in the output from calling generate with our numWords result) and then we tell Bixby what to write on the screen with the template text: Notice that this is the first bit of text in the above screenshot that appears when Bixby is displaying a result telling the user what it did for them!

The layouts for the visual part of the display (like this one, PasswordResult.layout.bml) look a lot like HTML! There are many documented UI widgets you can use such as pictures, hyperlinks, cards, and more. Here you can see we use a card to display the actual password, making it wrap onto the next line for long passwords and making them easy to copy. Down below in a div tag we display the password entropy. This is calculated using the formula from the XKCD comic, as described above. Finally we hyperlink to the comic that inspired this project as a way of giving credit.

A few more example passwords are shown below:

You can try it out for yourself in Bixby Studio! Simply click the icon that looks like a phone on the left hand side of the screen to open the Simulator, giving you an idea of what your capsule will look like on an actual Samsung device when the marketplace opens in a few months.

SHARING THE SOLUTION

This project can be found in its entirety on my GitHub! I hope this very early tutorial can help developers make their first steps into developing for Bixby, which I think has some very compelling development tools and technology behind it.

Making a Star Wars Poster Lithopane Lamp

Star Wars has some of the most iconic artwork of any movie franchise in history. Nowhere is this more apparent than in its theatrical posters. This lithopone lamp captures all nine main storyline Star Wars posters (with a blank standing in for Episode 9 until the official poster is released!) in a lovely desktop lamp that celebrates your favorite movies. Simply place an approximately 2.5 inch diameter LED puck light (like you would place under a cabinet) inside the indentation in the base and run the cord out the slot!

As always, you can edit this project on OnShape!

Also as usual you can download and print these files right now from my Thingiverse account.

Tutorial: Make a simple Alexa skill that uses a REST API

This tutorial adapts code from this excellent JavaScript cookbook prepared by AnalyticPhysics.com.

If you just want the code, click here.

If you just want to enable the Thornton Windchill skill we’ll make, click here!

UPDATE 11/25/18: IBM has deprecated the Weather Underground API. This tutorial has been updated to use the OpenWeatherMap API instead.

I spent some time back around Christmas trying to find an Alexa skill that would do one simple thing- tell me what it feels like outside. Sure there are plenty of weather apps, but I really wanted one that took the wind chill into account- that funny other temperature websites give you that’s usually called the “feels like temperature”. Finding none I decided to make my own.

The Amazon Echo surprised me with how easy it is to customize it and make your own skills (the apps of the Alexa world). Amazon has an amazing set of tools its Amazon Web Services, and it is brilliant that their microservices product, Lambda, is tied in with Alexa as a platform for hosting Alexa skills. This tutorial I have prepared will show how to develop an extremely simple but useful Alexa skill- one that interfaces with the API of your favorite website, pulls down some information, and then has Alexa tell you the latest updates when you trigger your new skill.

First things first, some vocabulary:

API: Application Program Interface. Many websites and tools have these so that developers can incorporate their functionality into their projects. If you’ve ever logged into a website using your Facebook login, you used the Facebook API. Google has many APIs for products like maps and search. Dropbox and OneDrive have APIs for saving files from apps. Dig around your favorite site and see if they have an API you’d like to use. For this tutorial, one that gives you information like headlines or the weather is best.

REST: Representational State Transfer. Essentially these APIs usually consist of a number of “links” like the ones you’d click on a webpage. They correspond to different functions in the API. There are endpoints for logins, uploading files, and other functions for all sorts of APIs. In the documentation for each API you should be given a list of endpoints to use. We will use Node.js to easily make requests to these links, known as HTTP Endpoints, to make our app work.

Node.js: A serverside JavaScript environment that is very handy for handling web functionality. You won’t need to install it for this tutorial but I recommend messing around with it. It makes web servers and making HTTP requests very easy. This skill will be written in Node.js and we will use the code editor on the Amazon Lambda website.

Using a REST API

REST APIs are how we will get information from our website of choice to our app so that Alexa can say it out loud. I want to make a weather app of course, so the first place I turn is OpenWeatherMap. You can get started with their API for free and they provide all the information needed to calculate a ‘feels-like’ temperature like the one we want, namely the current temperature and the wind speed. You can follow along with me there or you should find the steps to using other APIs are roughly the same. Go to your favorite news site and see if they have an API you can use.

Most APIs have you register an account so that you can be given a key. This key prevents abuse of their system and also lets you track the usage of your app in many cases. This key will have to be part of your requests to their servers.

Let’s go to the documentation to figure out how to make a request to get the current weather. I see on the sidebar that they have “conditions” listed so I click there. Depending on what you want to do you will have to learn the “lingo” of your specific API. For example, the Dropbox API has (or at least had) two different ways of uploading files. One request was for upload, the other was for chunking up files, and they were both called different things. Be careful and read what each endpoint does to make sure you build the most optimized app!

The documentation gives a nice example of what a call to the current conditions endpoint looks like, and I immediately identify the data I want to parse out:

Screen Shot 2018-11-25 at 2.30.52 PM

 

You will need to pass in your own API key you receive when registering at OpenWeatherMap when you make this request for your local. Keep your API key a secret- malicious users can use your key to deplete your free calls to the API and get your account disabled! Additionally, I have added an ‘&units=imperial’ to get the units in US customary units, but you can switch this to metric if that is the system your country uses.

What you are seeing here is the response to the HTTP request. REST APIs often respond in what is called JSON format, which is basically a really nice way of formatting data so that everything has a key and a value. That way you can search for the data you want just by having the key (since you likely aren’t sure of what the value is!). So here I would want to write code that picks out “temp” and ‘speed’ so that I can calculate the ‘feels like’ temperature using these values.

Knowing this, it’s time to start writing some code!

Getting Started with Amazon Lambda

You will need to go to the Amazon Lambda website and create an account. Lambda is for microservices- you write code and get a URL that triggers that code to run. This lets you do all sorts of fun things, like periodically check on  and analyze data from a weather-station in your back yard or send data to a database all without having to worry about servers or hosting the script. Today we will use it to build our Alexa skill.

createlambda

From the console, select “Create a Lambda function”

lambda

You don’t actually have to select a blueprint, but if you want to go ahead and just use the blank function blueprint. Click “Configure triggers”.alexaskills

We need to configure the app so that it uses Amazon’s kit for developing Alexa skills. You don’t need to have this installed or anything like that- it simply tells Lambda how exactly it is going to get triggered so it knows to accept requests from the Alexa Skills kit for triggering your app. Click the Alexa Skills Kit option and then click next.

configure

Now we are into the meat and potatoes of the actual skill development! Give your skill a name and a quick description. Leave the runtime alone- it’s fine as is using Node.js. If you’re reading this in the future and thinking “Ah man, we’re on Node.js 7.2 and this script will never work now!” I apologize but I’m trying to keep things as future proof as possible.

The Lambda function code box is where we will finally begin writing our app.

If you would like to simply grab all the code at once, the Github link is here.

The bulk of every Lambda function is the handler. You can see the tiny sample code they give you already. This is what Lambda will run when it is triggered. All your functionality is called from in here.

Let’s think about what we need to accomplish in our handler:

  1. Make an HTTP request to get our weather information
  2. Parse that big JSON response to just get the feels like temperature
  3. Store this in a way Alexa can say what it is.

Not too hard! Let’s take a look at the handler I wrote and then I will break it down.

We keep the structure of the handler the same- the function definition looks the same, we just do more inside! First, we get the Node.js HTTP client so that we can make our requests. The URL module will make it easier for you to format your request URL, but I don’t use it here. I included it so you know that it exists in case you need to build more complicated URLs, such as subsituting a user’s query. Here I am keeping it simple- simply paste in your zip code and your API key so that OpenWeatherMaps gives you the conditions at your location.  You can give your OpenWeatherMaps link a try in your browser. It should display a JSON object of current conditions. Not all endpoints let you do this, but it can be handy way to test your work. The next section is the actual HTTP request. We make a GET request because we are GETTING something. If you want to upload something to a server you would make a PUSH or PUT request, and there are many other types of requests you can try. But for now we use the simplest- a GET request. You can see we use our HTTP client and set up a function with a single response parameter. This response is what Alexa is going to say! The empty data string simply allocates memory for us to put our desired JSON object into. For me it will be the ‘feels like’ temperature. I tell Lambda that after it makes the GET request, every time it gets data from the server it should add it to my empty data string. This way we get the entire response and we handle our inputs. Finally the real logic comes in handling the end of the response. When there is no more data we now need to parse our huge JSON object to get the specific data we want. By running JSON.parse I break the entire string up into keys and values that I can now search through to get my temperature and wind speed values for our ‘feels like’ temperature. Notice how I index into the JSON response using dot-notation (e.g. I know that the temperature is stored in main looking at the response in the above screenshot so I get at it by writing json.main.temp, since the temp is in the json object under main). The formula for calculating a temperature with wind chill using US Customary units is as follows (you should be able to find a corresponding metric formula on the web):

Wind Chill = 35.74 + 0.6215T – 35.75(V^0.16) + 0.4275T(V^0.16) (Courtesy of MentalFloss)

Finally you can see that I placed our ‘feels like’ temperature value in the middle of a written response for Alexa to read. You can make this whatever you want (so long as it fits Amazon’s community guidelines). We then output this response. How does Output work? You define it yourself. Let’s take a look:

The function of output is simple: We are now giving Alexa a JSON object to read! It’s really JSON all the way down if you’re starting to catch on. This is mostly provided by Amazon’s documentation but let’s explain it anyway. Response is our JSON response when Alexa triggers our Lambda function, so that’s what it stores. It has a section for specifying how the output speech will work- PlainText is what you will use almost all the time and Alexa will simply read what you give it. The “Card” is what appears in the Alexa app when the user checks on their Amazon device what people have been asking Alexa or if they want to read what the response was later. We specify a simple card- the name of the app (for identification) and the text that Alexa gave. That’s it! Finally we set the end session variable to True. We have no reason to tie up Alexa any longer waiting for any more input after we get our weather, so we tell the device that the skill is done. The final line simply says that if the response is successfully built to return it to the Alexa device calling the function.

Go ahead and check your work against the whole code file on Github now.

Before you create your function you need to assign a role to it. This basically just lets Amazon know what permissions your skill needs. Go ahead and let Lambda create a role for you and give it a name. This is handled automatically. Select the parameters as shown:

policy

Go ahead and click next. On the review page, click Create Function. You’re done! Click the ‘Test’ button on the top toolbar to create a test invocation of your new Lambda function and name it whatever you wish. You can leave the default inputs as this simple Alexa skill does not process any user input. From the dropdown menu select your new test event and press ‘Test’ to run it. You should see a successful result displayed on your screen:

Screen Shot 2018-11-25 at 2.39.04 PM

 

More complicated Lambda functions will let you specify JSON test files that will simulate various inputs and outputs so you can test your skill.

But I wanted to hear Alexa run my skill!

I know! In order to do that though you need to go about adding your Alexa skill to the Amazon Developer Console. Keep your Lambda tab open- you’ll see that there is now a number associated with your function called an ARN you’ll have to paste into the skill form.  Amazon covers this process really well in step 2 of this great visual guide! When it asks you for your intent schema and sample utterances, go ahead and use the ones from my GitHub repository and modify them to taste. You will then get a chance to test your skill and hear Alexa say your response on your PC from your browser! Once you are done testing submit the skill for certification and if Amazon approves it, your friends can find your skill and enable it. You can also create skills just for yourself and add them to your device now that you have the skill up on the developer console.

Have fun! Creating skills for yourself and for your friends can be a rewarding and fun aspect of owning an Alexa device. It can also get you some free swag. Let me know if you have any problems in the comments, and good luck!

American Roller Coaster Trivia Alexa Skill

If you just want to add the game to your Alexa device, click here!

THE PROBLEM

Sometimes the problem I am trying to solve isn’t something practical like a new educational game or 3D printed gadget. This time, the problem I was trying to solve is that I wanted a t-shirt. Amazon has monthly offers for swag if you develop skills for their Amazon Alexa product, which are basically like the platform’s “apps”. They let you have the digital assistant perform all sorts of tasks, like turn on and off your smart devices or interface with your favorite websites. You can also make Alexa play games with you, which is what I decided to do.

It’s no mystery from my website that I love roller coasters, so I decided that what Alexa really needed was a roller coaster trivia skill. This would let me get my feet wet into how Alexa development works and let me produce a fun game to share with my fellow roller coaster enthusiasts. The idea is that it will take a number of questions I have written about the best roller coasters in America, like what years rides opened, what park they are at, how fast they go, and more. Alexa asks five of these questions at random and keeps track of your score. Here’s some sample questions:

Name this roller coaster model who’s namesake was used by Aboriginal Australians for hunting. (Answer: Boomerang)

The oldest roller coaster in America is what? (Answer: Leap-the-Dips)

THE SOLUTION

I was happy to discover that I could write Alexa skills using one of my favorite languages, Node.js, and that I could set them up on one of my favorite platforms, Lambda, which is part of Amazon Web Services. Essentially you can develop an Alexa skill like you would any microservice. Of course, there are rules that you have to follow that you can learn from one of their many samples, which is where I found an excellent sample for a trivia game. My coaster trivia uses this code extensively, and let me see a fully functioning Alexa skill that follows the best practices. I discovered all sorts of things- building responses, handling requests, and what Alexa has trouble saying. When you go to publish your Alexa skill you get a nice interface to test what you’ve done, which is where I found out that some roller coasters have names that do not play well with Alexa’s speech technology. I found, for example, that the roller coaster Rougarou is very hard to understand, so my questions related to it had to be dropped. If you are at all interested in making your own skills for Alexa, I recommend starting with one of their samples. It walks you through the whole process, from getting started with Lambda to creating intent schemas and testing.

SHARING THE SOLUTION

I made all of my code public by forking the original Amazon sample. You can get it here!

I really like the Alexa platform and have many more skills in development. I have a lot of plans as to how digital assistants may be valuable in the classroom. You can keep posted on my projects here, or on my new Alexa Skills page I linked from my Projects page.

Finally, you can add the skill to your Alexa device by clicking here, or search American Coaster Trivia in your Alexa app!