I admit, I’m late to the whole Neural Network party.  With all of the major news covering AI that use neural network as part of their implementation, you’d have to be living under a rock to not know about them.  While it’s true that they can provide more flexible models compared to the other machine learning algorithms, they can be challenging to work with.

The idea behind Neural Networks

A Neural Network is a system of neurons that are connected to provide a functionality.  For adult humans, our brain consists of about 100 billion neurons.  Each neuron is responsible for relaying information from one neuron to the next.  The biological mechanisms on how neurons actually relay information is irrelevant here, but think of a neuron like a function that takes in an input and gives an output.

For us, we use a group of neurons to perform a certain task, ranging from smell to movement.  Additionally, with each activity, the group of neurons associated with that task will be optimized to perform better.

What about Artificial Neural Networks?

The idea above is similar to how artificial neural networks work.  For a visual, here’s a structure of a basic neuron.

A basic structure of an artificial neuron.

Combining individual neurons will produce a layer.  Stacking layers on top of each other will produce a network.

A neural network with 3 layers

Determining the number of neurons for the input layer is simply the number of inputs that you will be feeding into the network.  The number of neurons in the output layer is determined by the number of categories that can be returned.  Unfortunately, as far as I know of, the number of neurons in a hidden layer is found by trial and error.  It is possible to have multiple hidden layers, but this brings the drawback of slower training times and problems with accuracy.

Calculating a Neural Network

With our basic neuron, let’s figure out what’s going on.

Note that when computing neural networks, we represent layers as a matrix.

Here, we have numerous input values x that are fed into the neuron.

\[ \begin{bmatrix} x_{11} & x_{12} & \dots & x_{1m} \\ x_{21} & x_{22} & \dots & x_{2m} \\ \dots &\dots & \dots & \dots \\ x_{n1} & x_{n2} & \dots & x_{nm} \\ \end{bmatrix} \]

Our matrix, represented as as a n*m matrix, where n is the number of inputs and m is the number of training examples.

The matrix W holds our weights for a particular layer.

\[ \begin{bmatrix} w_{11} & w_{12} & \dots & w_{1s} \\ w_{21} & w_{22} & \dots & w_{2s} \\ \dots & \dots & \dots & \dots \\ w_{n1} & w_{n2} & \dots & w_{ns} \\ \end{bmatrix} \]

Our weights is represented by an n*s matrix.  Here, s is the number of neurons in a layer.  The purpose of this matrix is to hold values pertaining to inputs and is adjusted after each training step.  After training, the weights should be tuned enough where it can determine the output of a given dataset.

Finally, there is a bias that is added to the computed product of x and W.  This bias matrix should be of size s*m.

With this information, we can perform forward propagation with our neural network.  To perform calculations, we only calculate values on a layer-by-layer basis.  In our case, we’ll define z_i as a matrix holding the outputs for the layer i.  Our formula is defined below:

z_i = W_i^Tx + b

Oftentimes, W_i is transposed, but so long as you’re able to have a matrix size of s*m, you should be able to computez_i.

After computing z_i, you need to feed z_i into an activation function h_\theta(z_i).  The activation function restricts the range of outputs that are returned.

There are numerous types of activation functions that can be used:

  • Sigmoid function – Similar to the function used in logistic regression.
  • ReLU function – A basic function that returns itself if the input is positive and 0 otherwise.
  • Softmax function
  • Hyperbolic tangent

You can dedicate each activation function to a different layer.  For example, in our neural network below, the input layer can use ReLU, the hidden layer can also use ReLU, and our output layer can use the sigmoid function.

For the output layer, you want your neuron(s) to produce a probabilistic value from 0 to 1.

When training neural networks, back propagation is performed to adjust the change of the weight for the next iteration.  I won’t be covering back propagation here since I don’t have an exact understanding on the mathematics involved.

Pros and Cons

With people constantly claiming that neural networks will lead to superhuman intelligence, neural networks aren’t the panacea to every problem.  As with other models, neural networks has its tradeoffs.

First the pros:

  • Neural networks can be used to represented really complex models that other methods cannot.  For example, they are often used in speech and image recognition.
  • Neural networks can be fault tolerant.  If there is bad data, the network would gradually degrade in performance rather than abruptly.  Of course, too much bad data and the network won’t be reliable.
  • Since calculations can be performed in parallel, training neural networks can be very fast.

Due to these pros, the idea of neural network has spanned Deep Learning and Reinforcement Learning, which improves a neural network’s ability to perform certain tasks.

Despite these benefits, there are several drawbacks:

  • Neural networks require far more data compared to other machine learning languages for training.
  • Neural networks can take a really long time (days, if not hours) to train.  The amount of data and the amount of layers in the network greatly affect the training time.
  • They can be really hard to debug due to their black box implementation.
  • For complex models, you’ll often need to use specialized hardware, such as GPUs, to greatly speed up training.  TPUs are much better, but you can’t buy one at the time of writing.  However, you can cheaply train one on the cloud.


Despite the idea of neural networks being around since the 1960s, they have become a recent resurgence due to cheaper, faster hardware and more data.  They have allowed for significant advancements ranging from computer vision to text recognition.  They even serve as a basis for self-driving cars.

However, one thing to note is that, as with other machine learning algorithms, neural networks cannot solve everything.  Therefore, neural networks are best used for problems that cannot be easily represented.

Missed anything?  Have cool usage for neural networks?  Share your comments down below.