Amazon Machine Learning

Machine Learning is a branch of Artificial Intelligence in which computer systems are given the ability to learn from data and make predictions without being programmed explicitly or any need for human intervention.

I’ve discussed Machine Learning deeply in this post, regression algorithms in this post and classification algorithms in this post.

In this post, I would like to go over how we can use Amazon Machine Learning Service to train, evaluate and deploy our model for batch and real-time predictions.

As per Amazon, Amazon Machine Learning Service is

A robust cloud-based service that makes it easy for developers of all skill levels to use Machine Learning Technology.

Before moving any further let’s define few ML terminology that we will use in this post. When we say,

That’s a precise introduction, but the real question is

How Amazon Machine Learning works

To answer it, let’s deep dive into Amazon Machine Learning and build a model from scratch. Below is a step by step guide on how to use Amazon Machine Learning Service.

Step 1: Preparing Data Source

Let’s quickly create a model based on the following data, Where we would be predicting the salary based on years of experience that a particular candidate has

YearsExperience Salary
1.1 39343
1.3 46205
1.5 37731
2 43525
2.2 39891
2.9 56642
3 60150
3.2 54445
3.2 64445
3.7 57189
3.9 63218

You may click here to get Full data.
Data Credits All the data used in this tutorial is take from Superdatascience data set

Amazon Machine Learning Service needs a data source to train a model. So let’s upload Salary_Data.csv data to S3 bucket. (You may refer to this AWS Tutorial for it)

Let’s name our bucket as bornshrewd-aws-machine-learning-demo

It has data Salary_Data.csv as shown in the following picture.

AWS S3 Salary Data

To allow Amazon Machine Learning Service to access Salary_Data.csv from S3 bucket, we need to set proper permissions.

Let’s click on the Permissions Tab of S3 bucket and set Bucket Policy so that Amazon Machine Learning Service can access data in our bucket.

A sample policy file looks like

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "AmazonML_s3:ListBucket",
            "Effect": "Allow",
            "Principal": {
                "Service": ""
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::bornshrewd-aws-machine-learning-demo"
            "Sid": "AmazonML_s3:GetObject",
            "Effect": "Allow",
            "Principal": {
                "Service": ""
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::bornshrewd-aws-machine-learning-demo/*"
            "Sid": "AmazonML_s3:PutObject",
            "Effect": "Allow",
            "Principal": {
                "Service": ""
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::bornshrewd-aws-machine-learning-demo/*"

After, updating policy our bucket looks like AWS S3 Policy

That’s all we have to do for preparing data, let’s move to

Step 2: Amazon Machine Learning Service
If you are launching Amazon Machine Learning first time, you see a console like

Amazon Machine Learning Start Screen

Click on Get Started, You will see a screen like Amazon Machine Learning Setup

Click on View Dashboard

You see a Dashboard which lists all the objects. As we have not done anything yet, it doesn’t list any object as seen below.

Amazon Machine Learning Object Dashboard

Click on Create New and Choose Datasource

However, wait,

What is Datasource in Amazon Machine Learning

Datasource in Amazon ML contains the metadata associated with input data(Salary_Data.csv).

Let’s create a Datasource in the following 5 Steps.
Step 2a: Input Data
In this step, we provide input source of our data. We can provide Source as S3 or Amazon Redshift. In our case, we choose S3 and provide our bucket location.

After filling all the details, it looks like Amazon Machine Learning Datasource

Click on the Verify button, and it will validate whether Amazon Machine Learning can access our S3 bucket.

On Successful validation you will see a message like Amazon Machine Learning Datasource Validate

Click on Continue

Step 2b: Creating Schema
In this step, Amazon ML will infer the column names and Data type of each column.

Amazon ML has automatically identified DataType of each column as Numeric which is correct in our case.

Regarding Does the first line in your CSV contain the column names? Click on Yes.

Schema screen looks like Amazon Machine Learning Schema

Click on Continue

Step 2c: Target
In this step, Amazon Machine Learning will ask you about Dependent Variable Or value to be predicted. In our case choose, Salary as seen below,

Amazon Machine LearningTarget

Click on Continue

Step 2c: Row Identifier
A row identifier is used for reference purpose. We don’t have any identifier in our data. So let’s go with default option No as chosen below.

Amazon Machine Learning ROW_ID

Click on Review

Step 2c: Row Identifier
This page contains the summary of all the options chosen by us for review as shown below.

Amazon Machine Learning Review

You may edit the options (if required). In our case, Click on Create Datasource.

We now see a Screen Which says, Datasource creation is Pending

It will take a while to create Datasource. Let’s wait for it to finish and go back to our dashboard.

When it finishes, you will see a screen like this

Amazon Machine Learning

Let’s move on to

Step 3: Create ML Model
Click on Create New > ML Model

You will have to follow the following 6 Steps

Step 3a: Input Data
In this step you have to specify datasource,
Choose I already created a datasource pointing to my S3 data

Amazon Machine Learning

Then click on Salary_Data.csv. You will see

Amazon Machine Learning input

Click on Continue

Step 3b: ML Model Settings
In this step, Amazon will select a model type for you.
In our case, it is REGRESSION.

For Training and Evaluation Settings let’s choose custom to understand better as seen below

Amazon Machine Learning Settings

Click on Continue

Step 3c: Recipe
In this step, Amazon ML helps us to transform our data to optimise ML model by using a default Recipe which looks like

  "groups": {
    "NUMERIC_VARS_QB_50": "group('YearsExperience')"
  "assignments": {},
  "outputs": [

quantile_bin is grouping our data into bins of size 50. You may refer this document for more details on a Recipe

In our case let’s not do any processing of our data and will have a simple recipe like

  "outputs": [

Which means pass all numeric data as it is.

Click on Verify to validate Recipe. You will see a message, Recipe is valid as shown below.

Amazon Machine Learning Recipe

Click on Continue

Step 3d: Advanced settings
In this step, We can specify the max size of our model, max data passes, shuffle type etc.

Leave it for default settings as shown below,

Amazon Machine Learning Advanced settings

Click on Continue

Step 3e: Evaluation
In this step, it will ask whether we want to evaluate our data. If Yes, it will help us in splitting our data into a training set and evaluation set.

Choose Yes and rest all default actions

Amazon Machine Learning Evaluation

Click on Review

Step 3f: Review
In this step, You can review/edit all the settings of previous steps. If everything looks good, click Create ML model.

You will be redirected to ML model summary page having status as pending. Wait for a while our model is being created.

Once it is completed, you can click on Download Log for checking rmse, learning-rate etc. This page also shows the RMSE results of your Evaluation data. You may refer this document for more details on rmse.

Scroll down to predictions section, where you will see three options

Amazon Machine Learning Prediction

Let’s try Try real-time predictions, Click on it

Amazon Machine Learning Prediction Results
In Try real-time predictions Screen, enter Years Experience and click on Create prediction. In right hand side you will see the Predicted Results along with other details like Algorithm, Predictive Model Type etc

Congratulations you have your model up and running on AWS Infrastructure.

If you go to the Dashboard, you will see five objects

Amazon Machine Learning Dashboard

You may go ahead and create an endpoint for real-time predictions for your app.

In conclusion, I would like to say Amazon Machine Learning service accelerates the delivery period of any ML Project. It comes with sensible and tested recipes, configurations etc. We can quickly expose a model as a restful endpoint and start predictions.

On the downside, Amazon Machine Learning service doesn’t give us control over hyperparameters of an algorithm. Also, there are limited algorithms available to build a model which are as follows,


Stay in Touch

Receive Email Notification of Latest Tutorials.

Loading comments...