Neural Networks Part 2: Unveiling the Secrets of the Fruit Color Case

Hello Again!

Ready for another slice of neural network wisdom? Today, we’re pulling back the curtain on how these digital detectives, disguised as math-powered superheroes, predicted the colour of our disguised fruit in the previous part of this series. We’re going to be using maths today, so buckle up.

Inside the Neural Network’s Brain (Bits and Bobs):

Figure 1: A very simple neural network with 2 input neurons and 1 output neuron.

Imagine our detective is a rookie, untrained for action. It’s his first day on the job and he already has to deal with tons of data…what a great start!

He picks up the data set and decides to have a look to at the first fruit on the list and guess its colour without looking at the answer:

RowFruit Weight (grams)Sweetness LevelColor (0=Green, 1=Red)
111760 (Green)

Let’s imagine that the neural network shown in figure 1 is our detective’s brain. He will feed the value 117 into input 1 and 6 into input two of his brain. It’ll then spit out a value between 0 and 1 at its output. The output’s task is to guess the color of the fruit. The first time around, detective’s brain guesses a value of 1. This means that it was wrong, because 1 is a red colour, and the colour of the fruit is in fact green…

Let’s take a closer look at how his brain arrived at this number. This is where the math starts, but we love math, so let’s get on with it!

The output is essentially a function. Nothing less, nothing more. Here’s what this function looks like:

\text{Output} = \text{Activation Function} \left( \sum (\text{Inputs} \times \text{Weights}) + \text{Bias} \right)

Ok, that’s great…but what exactly does it do to the numbers we fed into it? Why is it called activation function and what the hell are those weights and bias? That’s a lot of question. Let’s take it step by step.

Weights

Basic Concept:

  • Influence of Inputs: Weights determine the influence of each input on the neuron’s output. You can think of weights as factors that decide how much impact each input will have on the final decision made by a neuron (aka the brain’s output).

Functionality:

  • Weighted Sum Calculation: In a neural network, each neuron calculates a weighted sum of its inputs. This sum is then processed by an activation function to produce the neuron’s output.

Purpose:

  • Pattern Learning: Weights are adjusted during the training process to help the network learn patterns from the data. By fine-tuning these weights, the network can make more accurate predictions or categorizations.
  • Capturing Relationships: Weights capture the relationships between inputs and outputs. High weights imply a stronger influence of an input on the output, whereas low weights indicate a lesser influence.

Training Process and Weight Adjustment:

  1. Initial Setting: Weights are initially set to random values or according to specific initialization strategies.
  2. Backpropagation: During training, weights are adjusted based on the error between the network’s prediction and the actual output. This process uses the backpropagation algorithm to compute the gradient of the loss function with respect to each weight which we’ll talk about soon.
  3. Gradient Descent: The network employs gradient descent or similar optimization techniques to update the weights, aiming to reduce the loss. We’ll get into more detail later.

In Summary:

  • Weights are dynamic parameters that the neural network adjusts through learning. They are crucial for the network’s ability to make sense of the input data and produce accurate outputs. Proper tuning of weights is essential for the effectiveness of a neural network in tasks such as classification, regression, or any predictive modeling.

Bias

Bias in a neural network is a key concept that helps the network make more accurate predictions.

Basic Concept:

  • Offset to Decision Making: Think of bias as an offset or adjustment to the decision-making process of a neuron in the neural network. It’s like a starting point or base level for the neuron’s output.

Functionality:

  • Adjusts Activation Function: Each neuron in a neural network calculates a weighted sum of its inputs and then applies an activation function to this sum. Bias is an additional parameter added to this weighted sum before the activation function is applied.

Purpose:

  • Fine-Tuning Output: The update in bias is crucial for fine-tuning the neuron’s output, allowing the network to better fit the training data.
  • Improving Model Accuracy: By continually adjusting biases (and weights), the network incrementally improves its accuracy in predicting or categorizing data.

Training Process and Bias Adjustment:

  1. Initial Setting: When a neural network is first initialized, biases (like weights) are typically set to random values or start with a predefined initialization strategy.
  2. Learning from Data: During training, the network processes input data and compares its output against the desired outcome, typically using a loss function to measure the difference.
  3. Backpropagation: The core of training involves backpropagation, where the network adjusts its weights and biases in response to the error calculated by the loss function. This process involves calculating the gradient of the loss function with respect to each weight and bias in the network.
  4. Gradient Descent: By applying an optimization algorithm, often gradient descent or its variants, the network updates its weights and biases in the direction that minimally reduces the error.
  5. Bias Update: The bias of each neuron is updated based on its contribution to the overall error. The update is proportional to the gradient of the loss with respect to the bias and the learning rate, a parameter that determines the size of the steps taken during optimization.

In Summary:

  • Just like weights, biases are dynamic parameters in a neural network. Their continuous adjustment during the training process is essential for the network’s ability to learn from data and make accurate predictions. Without updating the bias, the network might fail to represent the patterns in data accurately, especially in cases where the data distributions require shifts in the activation function.

Activation Functions

Yes, that’s right. Activation functions, not one activation function. There are many activation functions. Here is a description of each of the ones shown in the diagram above.

Sigmoid Function:

  • Characteristic S-shaped curve.
  • Smooth gradient, preventing “jumps” in output values.
  • Output values bound between 0 and 1, making it useful for models where prediction needs to be normalized.
  • Formula: \sigma(x) = \frac{1}{1 + e^{-x}}

Hyperbolic Tangent Function (Tanh):

  • Similar to the sigmoid but with output values ranging from -1 to 1.
  • It is zero-centered, making it easier in some cases to model inputs that have strongly negative, neutral, and strongly positive values.
  • Formula: \tanh(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}

Mapping outputs to a specific range, like 0 to 1, normalizes the outputs of neurons. This normalization can help keep the values within a manageable scale, preventing values from becoming too large or too small (a problem known as exploding or vanishing gradients, which can hinder the learning process).
When outputs are mapped between 0 and 1, they can be also interpreted as probabilities. This is particularly useful in tasks like classification, where we want to assign probabilities to different classes. The sigmoid function, which maps values to the range (0, 1), is often used for binary classification tasks for this reason.

Rectified Linear Unit Function (ReLU):

  • Outputs the input directly if it is positive, otherwise, it outputs zero.
  • It has become the default activation function for many types of neural networks because it allows models to converge faster and perform better.
  • Formula: \text{ReLU}(x) = \max(0, x)

Leaky ReLU Function:

  • Similar to ReLU but allows a small, non-zero gradient when the input is negative.
  • This helps to alleviate the problem of “dying neurons” in a neural network.
  • Outside of the scope of this lesson, but you can research it on your own if you want!

Each of these activation functions has its own characteristics and is chosen based on the specific requirements of the neural network and the task at hand. We’re going to focus on the sigmoid function in this lesson.

Ok, so knowing all that, let’s try to calculate the result of the output we may get for the snippet of the data set we shown and the following weights and bias:

w1 = 0.5

w2 = 0.5

b= 0.3

Weights and bias all start randomly during the first epoch (or iteration) of training and then get adjusted based on the error of prediction. That’s where those values came from.

\text{Output} = \sigma \left( \sum (\text{Inputs} \times \text{Weights}) + \text{Bias} \right) = \sigma \left( \text{(Weight} \times \text{W1}) + \text{(Sweetness} \times \text{W2)} + \text{Bias} \right) = \sigma \left( \text{(117} \times \text{0.5}) + \text{(6} \times \text{0.5)} + \text{0.3} \right)

Adding it all up we get 61.8, we need to now plug it into the sigmoid function to get…

\text{Output} = \sigma(61.8) = \frac{1}{1 + e^{-61.8}} = 1

So the output equals 1, or rather a number very close to 1 as the sigmoid function never quite reaches 1.

We know that the output for this particular row of our data set should be 0 though. So how can the detective tell that it’s wrong? We’re going to explore this question in the next part of this series. Stay tuned!

One response to “Neural Networks Part 2: Unveiling the Secrets of the Fruit Color Case”

  1. Hi! If you like this post, leave a comment and please share it on your social media. It really helps a lot. Thanks and hope you enjoyed part 2 of this series! Stay tuned for part 3.

Leave a Reply

Discover more from NMKN Studio

Subscribe now to keep reading and get access to the full archive.

Continue reading