Classification in Machine Learning

Machine Learning is a branch of Artificial Intelligence in which computer systems are given the ability to learn from data and make predictions without being programmed explicitly or any need for human intervention.


I’ve discussed Machine Learning deeply in this post and regression in this post.


In this post, I would like to brush over common Machine Learning Classification Techniques. However, first, let’s answer

What is Classification problem in Machine Learning

In classification, we predict a category of a data point, unlike regression where we predict real constant values.


There are multiple classification techniques, but in this article, we will look into the following techniques viz,


Let’s look into each one of them individually beginning with

Logistic Regression in Machine Learning

Logistic Regression is a linear classifier build on following formula,



Above equation is the output of applying Sigmoid function in Linear Equation (Details of which is outside the scope of this tutorial)


Let’s understand this concept intuitively by taking an example.


Suppose we have Age vs. Action data and we want to predict whether an intended action (e.g., clicking on mail), is performed by a customer of a particular age


Plotting the data it looks something like,

Logistic Regression Data


Based on the age we want to predict the probability or likelihood, whether a person will click on mail (action) or not.


Applying logistic regression formula to the above data, we get a curve which looks like

Logistic Regression in Machine Learning


The Green curve is the best fitting line from the logistic regression equation which best fits the data set. We can use that line to predict the probability for a particular age.


Lets try predicting the probabilies of x=20,30,40,50

Logistic Regression Sample


Based on projection of x=20,30,40,50 we get probabilities as .07, .2, .85, .9 simultaneously.


To convert those probabilities to predictions, let’s choose a threshold of .5 (i.e., 50%). So the probability of


Based on the above threshold x=20,30 will not click on email and x=40,50 will click on email. We tried to predict the likeliness of a user to click on a mail.


To further dive into Logistic Regression let’s create a model, Where we would be predicting the whether a person will buy a particular product (e.g., a newly launched SUV Car) based on Age and Estimated Salary.


Sample from complete data set looks like

User ID Gender Age EstimatedSalary Purchased
15624510 Male 19 19000 0
15810944 Male 35 20000 0
15668575 Female 26 43000 0
15603246 Female 27 57000 0
15804002 Male 19 76000 0
15728773 Male 27 58000 0
15598044 Female 27 84000 0
15694829 Female 32 150000 1
15600575 Male 25 33000 0
15727311 Female 35 65000 0
15570769 Female 26 80000 0
15606274 Female 26 52000 0
15746139 Male 20 86000 0
15704987 Male 32 18000 0


Click here to get Full data.
Data Credits All the data used in this tutorial is take from Superdatascience data set


Below is a step by step approach for analyzing the data. Beginning with


Step 1: Loading and processing the data

# Logistic Regression

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values # Extracting Age and Estimated Salary
y = dataset.iloc[:, 4].values # Extracting Purchased

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature scaling
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test) 


Step 2: Fitting Logistic Regression to training data

# Fitting Logistic Regression to Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)


Step 3: Predicting the Test set results

# Predicting the Test set results
y_pred = classifier.predict(X_test)


Step 4: Making Confusion Matrix
Before diving into Confusion Matrix, let’s understand following terms


Confusion Matrix basically tells us about True Positive(s), False Positive(s), True Negative(s) and False Negative(s).


Code to calculate Confusion Matrix

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)


O/P of Confusion Matrix looks like

array([[65,  3],
       [ 8, 24]])


Above array can also be visualized as

  Y Predicted = 0 Y Predicted = 1
Y Actual = 0 True Negative False Positive (Type I)
Y Actual = 1 False Negative (Type II) True Positive


Hence, we have 65 + 24 = 89 correct predictions and 8+3 = 11 incorrect predictions.


We can also calculate a few important Metrics like


Since we are discussing Metrics, let’s look into


AUC - Area Under Curve


F1 Score


Step 5: Visualisation of Logistic Regression
Step 5a: Visualising Training Data

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Logistic Regression (Training Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()


Logistic Regression Sample

Step 5b: Visualising Test Data

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Logistic Regression (Test Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()


Logistic Regression Sample


The goal of Logistic Regression classifier is to classify users into the right category. For a new user, our classifier will predict whether it belongs to the green region or red region based on its age and estimated salary.


Below is the detailed explanation of the above graphs


I think that is enough information for Logistic Regression, let’s look into

K-Nearest Neighbors (K_NN) Classification in Machine Learning

Suppose our data points look like,

K-Nearest Neighbors Data

Also, we have a new point marked as blue as shown below.

K-Nearest Neighbors Data and point

We want to predict whether it belongs to the Red set or Green set, we will apply the K-NN algorithm in following steps.

Step 1: Choose the K number of neighbors. Suppose K = 5
Step 2: Take K-Nearest neighbor of the new data point, according to Euclidean distance
Step 3: Among these K neighbors, count the number of data points in each category
Step 4: Assign the new data point to the category where we have counted the most neighbors


Analyzing the above data, the blue dot has three neighbors in the Red category and two neighbors in Green category. Hence, by applying the K-Nearest Neighbors algorithm, it belongs to the Red category as shown below.

K-Nearest Neighbors Data and point + Solution


Let’s try analyzing the dataset of Logistic Regression in K-Nearest Neighbors Regression, code of which looks like

# K-Nearest Neighbors Regression (K-NN)

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values # Extracting Age and Estimated Salary
y = dataset.iloc[:, 4].values # Extracting Purchased

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature scaling
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test) 


# Fitting K-Nearest Neighbors to Training set
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
## p = 2 for euclidean distance
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('K-Nearest Neighbors (Training Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('K-Nearest Neighbors (Test Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()


Confusion Matrix looks liks

array([[64,  4],
       [ 3, 29]])

Where we have only 8 (4+3) incorrect predictions, Which is an improvement over Logistic Regression of 11 wrong predictions.


Visualising Training Data K-Nearest Neighbors Train


Visualising Test Data K-Nearest Neighbors Test

Observations from the above graph


Moving on let’s look into

Support Vector Machine (SVM) Classification in Machine Learning

Suppose our data points look like,

Support Vector Machine Data


We want to separate the data points linearly by drawing a line between them. However, wait there can be N number of lines that can be drawn between Category 1 and Category 2. Which one to Choose?


Here SVM comes to rescue, SVM helps us in finding the best line Or best decision boundary which will help us in separating our space into classes.


SVM searches the line having Maximum margin from each point; graphically it looks like

Support Vector Machine Libe


In the above graph,


Let’s try analyzing the dataset of Logistic Regression in Support Vector Machine, code of which looks like

# Support Vector Machine (SVM)

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values # Extracting Age and Estimated Salary
y = dataset.iloc[:, 4].values # Extracting Purchased

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature scaling
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test) 


# Fitting Support Vector Machine (SVM) to Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)

classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Support Vector Machine (SVM) (Training Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Support Vector Machine (SVM) (Test Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()


Confusion Matrix looks liks

array([[66,  2],
       [ 8, 24]])

Where we have only 10 (8+2) incorrect predictions which are pretty decent.


Visualising Training Data Support Vector Machine Train


Visualising Test Data Support Vector Machine Test

Observations from the above graph


Can we improve the prediction accuracy of SVM?


Maybe we can by trying different Kernel(s). Let’s do that in

Support Vector Machine (SVM) with Non-Linear Kernel Or Kernel SVM

Suppose our data points look like,

Kernel SVM Data


It can’t be separated by a straight line implying our data is not linearly separable. However, the assumption or prerequisite of using SVM is that data is linearly separable.


So, how can we apply SVM in this scenario?


A simple way out will be to add an extra dimension to our data to make it linearly separable. We can do that in the following two steps


Applying both steps, we have a cirle separating our data which looks like,

Kernel SVM Data with boundary


Caveat of using above approach is, Mapping to Higher Dimensional Space can be compute intensive


Can we do better, i.e., can we apply SVM without going to a higher dimension?


Yes, we can by applying The Kernel Trick.


The detailed explanation of The Kernel Trick is beyond the scope of this tutorial. Just understand we take a Gaussian RBF Kernel, Formula of which looks like


and separate our dataset, i.e., build the decision boundary.


Basically, by figuring out optimum values of and we will plot a figure which will separate our dataset. (Detailed explanation of how it will achieve it is beyond the scope of this tutorial)


Remember by using Gaussian RBF Kernel trick we aren’t doing any computation in Higher dimension space.


Also, Gaussian RBF Kernel is not the only kernel that we can choose. We have other options as well viz,


I guess that’s enough of theory.


Let’s try analyzing the dataset of Logistic Regression in Kernel SVM, the code of which looks like

# Kernel SVM

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values # Extracting Age and Estimated Salary
y = dataset.iloc[:, 4].values # Extracting Purchased

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature scaling
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test) 


# Fitting Kernel SVM to Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
## RBF is a Gaussian Kernel

classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Kernel SVM (Training Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Kernel SVM (Test Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()


Confusion Matrix looks liks

array([[64,  4],
       [ 3, 29]])

Where we have only 7 (4+3) incorrect predictions which are a fantastic improvement over linear SVM, even better than Logistic Regression (having 11 wrong predictions) and KNN (having eight wrong predictions)


Visualising Training Data Kernel SVM Train


Visualising Test Data Kernel SVM Test

Observations from the above graph


Moving on let’s explore more and dig into

Naive Bayes Classification in Machine Learning

To understand Naive Bayes theorem let’s look at the data in below graph,

Naive Bayes Data


We have Salary Vs. Age plot in which Red dots are for people who walk to the office and Green dots are people who drive to the office.


Suppose we have a new data point (marked as grey in the above graph) we want to classify whether it walks to work or drives to work.


Let’s try solving this problem using Naive Bayes theorem. However, before that a quick.


Side Note: Bayes Theorem is represented by equation, (Detailed explaination of how we came up with the equation is beyond the scope of this tutorial)


Naive Bayes theorem will broadly perform three steps according to our problem. Let’s look into it, starting with


Step 1 Finding likelihood for a person to Walk to office Given X Condition
Where represents features of any person, in our scenario it is Age and Salary


Where,
= Prior Probability
It represents the probability of a person walking to work i.e.


= Marginal Likelihood
To calculate Marginal Likelihood, let’s draw a circle of arbitrary radius of our choice and draw a circle around observation. Like in below figure

Naive Bayes Data


Look at all the points inside the radius, in our case they are 4 (3 Red, 1 Green). We will deem all those points similar in feature to the data point (Grey Point) that we have.


will be the probability of any random point to fall into this circle i.e.


= Likelihood
means what is the probability that a person who walks to office exhibits feature X.


In the same circle, will be the probability of any random point to fall into this circle given that person walks i.e.

In essence, Number of Similar Observations Among Those Who Walk = Red Points falling inside the circle.


We are only considering people who walk to work, i.e., Red Points


= Posterior Probability
Which is equal to


Step 2 Finding likelihood for a person to Drive to office Given X



Calculating all the parts of RHS it becomes,


Step 3 Comapre both the probabilities and Decide
i.e. Vs
Vs
>
>


Hence, we will classify new data point as Person will walk to work.


That’s much theory. I know.


Let’s try analyzing the dataset of Logistic Regression in Naive Bayes, code of which looks like

# Naive Bayes

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values # Extracting Age and Estimated Salary
y = dataset.iloc[:, 4].values # Extracting Purchased

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature scaling
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test) 


# Fitting Naive Bayes to Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Naive Bayes (Training Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Naive Bayes (Test Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()


Confusion Matrix looks liks

array([[65,  3],
       [ 7, 25]])

Where we have 10 (7+3) incorrect predictions.


Visualising Training Data Naive Bayes Train


Visualising Test Data Naive Bayes Test


Observations from the above graph


Smooth Right. Only two more Algos left.


Let’s look into the penultimate algorithm, which is

Decision Tree Classification in Machine Learning

Decision Tree is very good for interpretation and can be used to analyse data which looks like,

Decision Tree Classification Data


Decision Tree basically cuts our data into slices which looks like,

Decision Tree Classification Data


But how those splits are selected ??


Splits are decided in a way to maximize the number of data points in a certain category with each split. The split is trying to * minimize entropy* (Details of which are beyond the scope of this post)


Let’s construct a decision tree based on the above split, which looks like

Decision Tree


Based on the above graph, Terminal leaves will predict whether it belongs to Red class or Green class.


Let’s try analyzing the dataset of Logistic Regression in Decision Tree classification, code of which looks like

# Decision Tree Classification

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values # Extracting Age and Estimated Salary
y = dataset.iloc[:, 4].values # Extracting Purchased

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature scaling
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test) 

## We don't need feature scaling in case of Decision Tree because it's not based on Euclidean distance.
## But since we are plotting with the higher resolution we will keep this step

# Fitting Decision Tree to Training set
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion= 'entropy', random_state=0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Decision Tree (Training Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Decision Tree (Test Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()


Confusion Matrix looks liks

array([[62,  6],
       [ 3, 29]])

Where we have 9 (6+3) incorrect predictions.


Visualising Training Data Decision Tree Train


Visualising Test Data Decision Tree Test


Observations from the above graph


Can we improve this model further ??


Imagine if we would have used 500 Decision Trees for prediction.


Let’s look into the last algorithm, which is

Random Forest Classification

Random Forest is a version of Ensembled Learning.


In Ensembled Learning we use several algorithms or same algorithm multiple times to build something more powerful.


Below are the steps of building a Random Forest
Step 1: Pick Random K data points from the training set.
Step 2: Build Decision Tree associated with these K data points.
Step 3: Repeat Step 1 and Step 2 N times and build N trees.
Step 4: Use all N Trees to predict the category of the new data point. Then choose the category that wins the majority vote.


We can see it improves the accuracy of our prediction because we are taking into consideration predictions from N trees.


Microsoft Kinect is a great example that uses Random forest algorithm to sense the movement of body parts.


Let’s try analyzing the dataset of Logistic Regression in Random Forest classification, code of which looks like

# Random Forest Classification

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values # Extracting Age and Estimated Salary
y = dataset.iloc[:, 4].values # Extracting Purchased

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature scaling
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test) 

## We don't need feature scaling in case of Random Forest because it's not based on Euclidean distance.
## But since we are plotting with the higher resolution we will keep this step

# Fitting Random Forest to Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion= 'entropy', random_state=0)
## n_estimators => Number of trees in forest
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Random Forest (Training Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.45, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Random Forest (Test Set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()


Confusion Matrix looks liks

array([[63,  5],
       [ 3, 29]])

Where we have 8 (5+3) incorrect predictions which are an improvement over Decision Tree Classification.


Visualising Training Data Random Forest Train


Visualising Test Data Random Forest Test


Observations from the above graph


Conclusion
In conclusion we need to choose a model with maximum correct predictions while preventing overfitting.


Looking at all the classification models and analyzing their confusion Matrix for accuracy and decision boundaries. It seems we should choose Kernel SVM for our problem.


Last but not the least we may consider following guidelines before picking a model


That’s it folks, that how classification in Machine Learning Rocks.

Reference

Stay in Touch


Receive Email Notification of Latest Tutorials.

Loading comments...