Approaching Classification With Neural Networks

Narasimha Jwalapuram 04 May, 2022 • 6 min read

This article was published as a part of the Data Science Blogathon.

Introduction on Classification

Classification is one of the basic tasks that a machine can be trained to perform. This can include classifying whether it will rain or not today using the weather data, determining the expression of the person based on the facial image, or the sentiment of the review based on text etc. Classification is extensively applied in various applications thus making it one of the most fundamental tasks under supervised machine learning.

There are various algorithms used to perform classification based on the type of dataset being considered. This ranges from tree-based classifiers like Random Forests and decision trees, to gradient boosted algorithms like XGboost or neural networks based classifiers. In this blog, let’s explore how to use neural networks to build custom classifiers for a tabular dataset. The advantage of using neural networks is that we can easily detect and learn unknown patterns present in the data.

But before we start with the classification let’s get started…

About the Dataset

The dataset we are using to train our model is the Iris Dataset. This dataset consists of 150 samples belonging to 3 species of Iris flower i.e. Iris Setosa, Iris Versicolour and Iris Virginica. This is a multi-variate dataset i.e. there are 4 features provided for each sample i.e. sepal length, sepal width, petal length and petal width. We need to use these 4 features and classify the type of iris species. Thus a multi-class classification model is used to train on this dataset. More information about this dataset can be found here.

Getting Started with Classification

Let’s get started by first importing required libraries,

import os
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers
from tensorflow.keras import losses
from tensorflow.keras import metrics
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from tensorflow import keras
from sklearn.preprocessing import LabelEncoder

Check the version of TensorFlow installed by following,

print(tf.__version__)

Next, we need to download and extract the dataset from here. Then move it to the location of notebook/script or copy the location of the dataset. Now read the CSV file from that location,

file_path = 'iris_dataset.csv'
df = pd.read_csv(file_path)
df.head()

We can see that our dataset has 4 input features and 1 target variable. The target variable consists of 3 classes i.e. ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris-verginica’. Now let’s further prepare our dataset for model training.

Data Preparation

First, let’s check if our dataset consists of any null values.

print(df.isnull().sum())

There are no null values. Therefore we can continue to separate the inputs and targets.

X = df.drop('target', axis=1)
y = df['target']

Since now we have separated the input features (X) and target labels (y), let’s split the dataset into training and validation sets. For this purpose let’s use Scikit-Learn’s train_test_split method to split our dataset.

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.2, 
                                                    random_state=42)

print("The length of training dataset: ", len(X_train))
print("The length of validation dataset: ", len(X_test))

In the above code, we have split the dataset such that the validation data contains 20% of the randomly selected samples from the whole dataset. Let’s now further do some processing before we create the model.

Data Processing

Since we have the data split ready, let’s now do some basic processing like feature scaling and encoding labels. The input features contain attributes of petal and sepal i.e. length and width in centimetres. Therefore these features are numerical that need to be normalized i.e. transform the data such that the mean is 0 and the standard deviation is 1.

Let’s use Scikit-learn’s StandardScalar module to do the same.

features_encoder = StandardScaler()
features_encoder.fit(X_train)
########################################################
X_train = features_encoder.transform(X_train)
X_test = features_encoder.transform(X_test)

Now we should encode the categorical target labels. This is because our model won’t be able to understand if the categories are represented in strings. Therefore let’s encode the labels using Scikit-learn’s LabelEncoder module.

label_encoder = LabelEncoder()
label_encoder.fit(y_train)
########################################################
y_train = label_encoder.transform(y_train).reshape(-1, 1)
y_test = label_encoder.transform(y_test).reshape(-1, 1)

Now let’s check the shapes of the datasets,

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

Great! Now we are ready to define and train our model.

Creating Model

Let’s define the model for classification using the Keras Sequential API. We can stack the required layers and define the model architecture. For this model, let’s define Dense layers to define the input, output and intermediate layers.

model = Sequential([
    layers.Dense(8, activation="relu", input_shape=(4,)),
    layers.Dense(16, activation="relu"),
    layers.Dense(32, activation="relu"),
    layers.Dense(3, activation="softmax")
])

In the above model, we have defined 4 Dense layers. The output layer consists of 3 neurons i.e. equal to the number of output labels present. We are using the softmax activation function at the final layer because it enables the model to provide probabilities for each of the labels. The output label that has the highest probability is the output prediction determined by the model. In other layers, we have used the relu activation function.

Now let’s compile the model by defining the loss function, optimizer and metrics.

model.compile(optimizer=optimizers.SGD(),
              loss=losses.SparseCategoricalCrossentropy(),
              metrics=metrics.SparseCategoricalAccuracy())

According to the above code, we have used SGD or Stochastic Gradient Descent as the optimizer with a default learning rate of 0.01. The SparseCategoricalCrossEntropy loss function is used. We are using SparseCategoricalCrossEntropy rather than CategoricalCrossEntropy loss function because our outputs categories are in the integer format. CategoricalCrossEntropy would be a good choice when the categories are one-hot encoded. Finally, we are using SparseCategoricalAccuracy as the metric that is tracked.

Now let’s train the model…

Model Training and Evaluation

Now let’s train our model using the processed training data for 200 epochs and provide the test dataset for validation.

history = model.fit(x=X_train,
          y=y_train,
          epochs=200,
          validation_data=(X_test, y_test),
          verbose=0)

Now we have trained our model using the training dataset. Before evaluation let’s check the summary of the model we have defined.

# Check model summary
model.summary()

Now let’s evaluate the model on the testing dataset.

# Perform model evaluation on the test dataset
model.evaluate(X_test, y_test)

That’s great results… Now let’s define some helper functions to plot the accuracy and loss plots.

# Plot history
# Function to plot loss
def plot_loss(history):
    plt.plot(history.history['loss'], label='loss')
    plt.plot(history.history['val_loss'], label='val_loss')
    plt.ylim([0,10])
    plt.xlabel('Epoch')
    plt.ylabel('Error (Loss)')
    plt.legend()
    plt.grid(True)
########################################################
# Function to plot accuracy
def plot_accuracy(history):
    plt.plot(history.history['sparse_categorical_accuracy'], label='accuracy')
    plt.plot(history.history['val_sparse_categorical_accuracy'], label='val_accuracy')
    plt.ylim([0, 1])
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True)

Now let’s pass the model training history and check the model performance on the dataset.

plot_loss(history)
plot_accuracy(history)

We can see from the graphs below that the model has learnt over time to classify different species almost accurately.

Save and Load Model

Since we have the trained model, we can export it for further use cases, deploy it in applications, or continue the training from left off. We can do this by using the save method and exporting the model in H5 format.

# Save the model
model.save("trained_classifier_model.h5")

We can load the saved model checkpoint by using the load_model method.

# Load the saved model and perform classification
loaded_model = models.load_model('trained_classifier_model.h5')

Now let’s try to find predictions from the loaded model. Since the model contains softmax as the output activation function, we need to use the np.argmax() method to pick the class with the highest probability.

# The results the model returns are softmax outputs i.e. the probabilities of each class.
results = loaded_model.predict(X_test)
preds = np.argmax(results, axis=1)

Now we can evaluate the predictions by using metric functions.

# Predictions
print(accuracy_score(y_test, preds))
print(classification_report(y_test, preds))

Awesome! Our results match the previous ones.

Conclusion on Classification

Till now we have trained a deep neural network using TensorFlow to perform basic classification tasks using tabular data. By using the above method, we can train classifier models on any tabular dataset with any number of input features. By leveraging the different types of layers available in Keras, we can optimize and have more control over the model training, thus improving the metric performance. It is recommended to try replicating the above procedure on other datasets and experiment by changing different hyperparameters like learning rate, the number of layers, optimizers etc until we get desirable model performance.