Comparing FeedForward, Backropagation, and Recurrent Neural Network

Neural Network For Machine Learning from Coursera is a difficult subject. In the end, I learned a lot of new things and likewise, I did not understand a few. I want to start summarizing my understandings before I forget.

First, let me compile the different types of Neural Networks that I have learned and the basic distinction of how they operate. Then, I will try to work on the details.

Feed-Forward Neural Network:

This is the most common form of Artificial Neural Network and the easiest one to understand. There are an Input layer, an Output Layer, and none or one or more layers of Hidden units. In this network, the information moves in the forward direction from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network.

FeedForward Net

The leftmost layer is the Input layer and the rightmost is the Output layer and the middle layer(s) are the Hidden Layers and the neurons in these hidden layers are the Activation Units.

The simplest kind is a single-layer perceptron network where there is a single layer of output nodes and the inputs are fed to the outputs via a series of weights. If there is more than one layer of Hidden Units, it is called a “Deep Neural Network”.

How does it work?

  • Initialize the weights with some random values and the Bias is usually set as 1.
  • Calculate the values at the hidden nodes as the products of the weights and the inputs in each node.
  • An activation function such as Sigmoid function is used to calculate the activation value at the hidden node.
  • The values are passed forward to the output node and the values at the output nodes are calculated finally as a function of input values.

FeedForward is a supervised learning model where we train with some input and some random weights and the corresponding output. Then we pass some test input data for prediction.

Example: Image Classification or Character Recognition.

Thus, the key to a great Model is finding the optimal values of weights for accurate predictions. This can be done by adjusting the weight values learning from the errors. Thus, to improve the Model, we need the Backpropagation Algorithm.

Backpropagation Algorithm:

Backpropagation is a Supervised learning algorithm to train multi-layer perceptrons. It can be considered as an extension of the FeedForward Neural Network where the weights are adjusted based on the Model output and sample Output differential.

How does it work?

  • Initialize the weights and bias with some random values
  • Perform Feedforward to get an output
  • Calculate the error i.e. the difference between the target output and model output
  • If the error is huge, we update the weights and the bias value and again perform feedforward and calculate the error. Basically, we need to decide whether we need to increase or decrease the weight value. If the Error Increases on increasing the weight, we need to update the weight to some lower value and calculate the error.
  • Repeat this process until the Error is minimum.
  • The weights and Bias value for which the Error is minimum is considered as the optimal solution for the Model.

Recurrent Neural Network (RNN):

RNN is a sequence model and useful for speech recognition or natural language processing. This is a more powerful and complex Artificial Neural Network than the FeedForward Neural Network. They have directed cycles in the connection graph. That means if we start at one node and follow the arrows, it is possible to come back at the same neuron where we started.

They have the capacity to retain information in their hidden state for a long time But it is difficult to train  Recurrent Nets that use this potential.

Recurrent Net

How does it work?

Now, that’s a little theoretical concept, right? To give a more practical example, suppose we are watching a movie and we can try to guess what could happen next. The previous events in the movie or some similar events in some other movie may help us to assume the later events in the movie.

So, that was our experience that was retained in those neurons in our brain. Recurrent Nets are almost similar as it allows information to persist. Thus a recurrent net considers the current input and also the learnings from previous inputs to form a prediction.

How it is different from the feedforward?

Feedforward Neural Network is efficient in performing the Classification problem. Thus it is more common to use feedforward for character recognition or pattern classification. For example, a system is fed with inputs of lots of images of hand-written alphabets. Then the trained system is used for the prediction of some new hand-written characters.

Now, what if the requirement is to analyze a sequence of events and not just an instance? If we need to analyze an entire sequence of inputs that may change over time, RNN is a possible solution. RNN is just a layered feedforward network where it is expanded over Time and the weights are constant.

Recurrent Net with 3 inter-connected neurons
Example:

Let me try to put an example.

In WhatsApp, when we are writing a message, if we type “Happy”, we can see the next two likely written words below “birthday” and “belated”. Now in Gmail, while writing a message, when I wrote “Happy” the system guessed the next word as “Thanksgiving”. I have no idea what Model Google or Facebook is using for this prediction. That’s not important here. Both the prediction seems to be different, but appropriate. Thus, the system must be fed with lots of input sentences and the Model is predicting the next probable word based on that previous learning.

Let’s work with a small example here.

I will use an example provided by Professor Andrew Ng because that example actually cleared my concepts. We have the below input sentence from the Harry Potter series:

“Harry Potter and Hermione Granger invented a new spell.”

The task is to build a Sequence Model to tell where are the names in this sentence. This Name Entity Recognition system has a lot of practical applications. Our aim is to return an output y that has one output per each input word and tells us which words are part of a person’s name. Thus,

x: Harry Potter and Hermione Granger invented a new spell
y:  1             1         0          1                 1               0       0    0     0

Explanation:

When we are reading a  sentence from left to right, the first word we are reading at timestamp 1, is an input, say x<1>. We will feed this input into a Neural Network Hidden layer and predict the output, say y1.

Then the second word is fed as input x<2> to another hidden layer of the neural network. Now, while predicting the output y2, a recurrent net uses the input x<2> along with some information or learning that it had computed at timestamp1.

Thus the activation value a1 from timestamp 1 is passed to timestamp 2. This is followed until the last timestamp t where the input is x<t> and the output yt is calculated.

A vocabulary or dictionary of words is already fed into the system and used for this prediction.

Limitations:

One limitation of this network is that for predicting an output at some timestamp t, the RNN uses the feedback or information from the earlier sequences and not the information at the later of the sequences. That means for predicting the output y3, the RNN does not utilize the information from x<4> or x<5> and so on.

So, how is that a problem? 

There is another intriguing example from Professor Andrew Ng to explain this.

An example input sentence is as below:

He said, “Teddy Roosevelt was a great president”.

Now, to decide if “Teddy” is part of a President’s name from the above sentence, the words in the earlier sequence ( “He”, “said”) are not enough.

We need information from the later words also. Right? What if, we feed another input sentence as below:

He said, “Teddy Bears are on sale”.
So, the first 3 input words  ( “He”, “said” and “Teddy”) are the same in the above 2 sentences. It is not possible to predict from these three inputs, if “Teddy” is the name of the president.  In the first example it is but not in the second example.

Solution:

There is always a solution to a problem and the answer to this RNN problem is Bidirectional RNN.  But, this post was meant to understand the basic similarities and distinctions between Feedforward, Backpropagation, and Recurrent Neural Net. I will work more on the recurrent neural network in some of my following posts.

Thank You!

0

Leave a Reply

Your email address will not be published. Required fields are marked *