Building a super simple Neural Network in Python
Artificial Intelligence?!
AI is having a moment right now. Let's be real here, since ChatGPT launched everyone is talking about artifical intelligence, machine learning, deep learning, killer robots......
The basic Neural Network (back-propagation) algorithm is the fundamental formula that makes up the bare bones of all deep learning systems. Before we build one in Python, let's take a look under the hood
What is a Neural Network?
Imagine you have a task to recognize handwritten numbers (0-9) on a piece of paper. You want the computer to learn how to do it automatically, just like you would. Here's how a neural network can help.
here is a rundown of the components that make up the architecture:
Neurons: Think of neurons as little decision-making units. They take input from something and produce an output.
Layers: Neurons are organized into layers. Each layer can have multiple neurons, and they work together to solve a specific part of the problem.
Input Layer: This layer receives the raw information, in our case, the image of the handwritten number. Each neuron in this layer represents one pixel of the image.
Hidden Layers: These layers are in the middle. They process and transform the information from the input layer. They learn patterns and features that help identify the numbers.
Output Layer: This is the final layer that provides the result. Each neuron here represents one possible number (0-9). The neuron with the highest activation (output value) indicates the predicted number.
Training: To teach the neural network, we need examples. We show it many images of handwritten numbers along with their correct labels. It adjusts its internal "weights" to improve its accuracy over time.
Activation Function: It determines if a neuron "fires" or not, meaning it produces an output. It adds non-linearity to the network, helping it learn complex relationships between inputs and outputs.
Learning: During training, the neural network compares its predictions with the correct answers and calculates how wrong it was. It then adjusts its weights to minimize this error.
Optimization: The process of adjusting the weights to reduce the error is done using optimization algorithms like "Adam." These algorithms update the weights in an efficient way to improve the network's performance.
Testing: Once the network is trained, we test it on new, unseen images to check its accuracy. We compare its predictions with the correct labels to evaluate its performance.
That's the basic idea behind neural networks. By organizing neurons into layers and teaching them with examples, the network can learn to recognize patterns and make predictions. It's like teaching a computer to see and understand things, but in a simplified way!
The problem.......
The problem we are going to be solving today is to attempt to identify hand drawn letters. This is a traditional Deep Learning academic problem, commonly used to improve performance and efficiency of computer vision models. Take a look at a sample below:
The Code
OK so up until now we have looked at some very very basic workflows of a neural network architecture. Now let's take a look at the boiler plate code, followed by an explanation of each stage. Although we have looked at the high level aspects, there is some optimisation and structure we still need to put in place.
import tensorflow as tf from tensorflow import keras # Define the model model = keras.Sequential([ keras.layers.Dense(32, activation='relu', input_shape=(784,)), # Input layer with 784 nodes keras.layers.Dense(10, activation='softmax') # Output layer with 10 nodes (for 10 classes) ]) # Compile the model model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Load and preprocess the dataset (e.g., MNIST) (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() # Flatten and normalize the input images x_train = x_train.reshape(-1, 784) / 255.0 x_test = x_test.reshape(-1, 784) / 255.0 # Train the model model.fit(x_train, y_train, epochs=5, batch_size=32) # Evaluate the model test_loss, test_acc = model.evaluate(x_test, y_test) print('Test accuracy:', test_acc)
In this example, we define a simple neural network with an input layer consisting of 784 nodes (corresponding to the flattened 28x28 images of the MNIST dataset), a hidden layer with 32 nodes, and an output layer with 10 nodes representing the classes (digits 0-9). The activation function used in the hidden layer is ReLU, and the output layer uses the softmax activation function.
The model is then compiled with the Adam optimizer, sparse categorical cross-entropy loss function, and accuracy as the metric to optimize.
We load the MNIST dataset, preprocess it by reshaping and normalizing the input images, and then proceed to train the model using the training data for 5 epochs with a batch size of 32.
Finally, we evaluate the trained model using the test data and print the test accuracy.
Ok but what does all of that even mean???????
Neural Network: A neural network is a computational model inspired by the human brain's structure and functioning. It consists of interconnected nodes (neurons) organized in layers. Each node takes input, performs a mathematical operation, and produces an output.
Flattened Data: In the example, the input images from the MNIST dataset are flattened. This means that the 2D images, which are originally 28x28 pixels, are reshaped into a 1D array of 784 pixels. Flattening simplifies the input representation for the neural network.
MNIST Dataset: The MNIST dataset is a widely used benchmark dataset in machine learning. It consists of a collection of grayscale images of handwritten digits (0-9) and their corresponding labels. It is often used for training and evaluating image classification models.
Hidden Layer: In a neural network, a hidden layer is a layer between the input layer and the output layer. It performs computations on the input data and extracts relevant features. The nodes in the hidden layer are not directly connected to the input or output.
Nodes: Nodes, also known as neurons, are the basic units in a neural network. Each node receives inputs, performs a mathematical operation (e.g., a weighted sum), and applies an activation function to produce an output. Nodes in a layer are typically connected to nodes in the adjacent layers.
Output Layer: The output layer is the final layer in a neural network. It produces the network's output based on the computations performed in the preceding layers. The number of nodes in the output layer depends on the problem at hand. In this example, there are 10 nodes in the output layer, one for each possible digit (0-9).
Activation Function: An activation function introduces non-linearity to the output of a node in a neural network. It helps the network learn complex patterns and make predictions. In this example, two activation functions are used:
ReLU (Rectified Linear Unit): It returns the input if it is positive, and zero otherwise. ReLU is commonly used in hidden layers to introduce non-linearity and handle complex data relationships.
Softmax: It converts the output of the last layer into a probability distribution over multiple classes. It ensures that the predicted class probabilities sum up to 1, making it suitable for multi-class classification problems.
Adam Optimizer: Adam (Adaptive Moment Estimation) is an optimization algorithm commonly used to update the weights and biases in neural networks during training. It adapts the learning rate dynamically based on the gradient's magnitude, making it efficient in many scenarios.
Categorical Cross-Entropy Loss Function: The loss function measures the discrepancy between the predicted output and the actual output. Categorical cross-entropy is a commonly used loss function for multi-class classification problems. It calculates the loss by comparing the predicted class probabilities with the true class labels.
Reshaping: Reshaping refers to changing the shape or dimensions of the data while preserving the total number of elements. In the example, the input images are reshaped from a 2D matrix (28x28 pixels) to a 1D array (784 pixels) to match the input requirements of the neural network.
Epochs: In machine learning, an epoch refers to a complete iteration through the entire training dataset during model training. In the example, the model is trained for 5 epochs, meaning that the entire training dataset is passed through the network 5 times.
Batch Size: During training, the dataset is divided into smaller subsets called batches. The batch size determines the number of samples that are processed by the network before updating the weights
So what have we learned?
We learned the basic components of a Neural Network Architecture and looked at an example with a description for each of the parameters!
This is a great first step on the road to truly understanding deep learning! Even if the concepts were a little tough this time!
Don't forget to keep learning!