A major characteristic of all densely & convolutional neural networks we’ve worked with so far is that they have no memory. Each input shown to them is processed independently, with no state kept between inputs.
With networks like these (feedforward & Conv), in order to process a sequence or a temporal series of data points, you have to show the entire sequence to the network at once: turn it into a single data point, aka flatten it.
In contrast… as you are reading this specific present sentence, you are processing it word by word, while keeping memories of what came before, this gives you a fluid representation of the meaning conveyed in this sentence (sort of like a sequence).
Understanding RNNs
Human intelligence processes information incrementally while maintaining an internal model of what it’s processing, built from past information & constantly updated as new information comes in.
A Recurrent Neural Network (RNN) adopts the same principle, albeit in an extremely simplified version: it processes sequences by iterating through the sequence elements and maintaining a state that contains information relative to what it has seen so far. In effect, an RNN is a type of neural network that has an internal loop.
The state of the RNN is reset between processing 2 different independent sequences, so you still consider 1 sequence to be a single data point: a single input to the network. What changes is that this data point is no longer processed in a single step; rather the network internally loops over sequence elements.
Let’s go ahead and implement a simple dummy RNN to understand this
A Dummy RNN
Our dummy RNN needs a starting point.
Let’s say we say the state at time (t) = 0
state_t = 0
With our starting point established, we will want it to iterate & do something over a sequence
for input_t in input_sequence:
For each of the iterations, the previous output becomes the state for the next iteration: output(t) = function(t, output(t-1))
output_t = f(input_t, state_t)
state_t = output(t)
f in this case is literally just a function that does something.
Different RNN Layers
Here are some different RNN layers, and a quick summary of what they do. For this example, we’ll say the number of features = 14, and the model outputs a single 16 dimensional vector summarizing the entire input sequence.
An RNN layer that can process sequences of any length
num_features = 14
inputs = keras.Input(shape=(None, num_features))
outputs = layers.SimpleRNN(16)(inputs)
This is super useful if your model is meant to process sequences of variable length. However, if all of your sequences have the same length, I recommend specifying a complete input shape, since it enables model.summary() to display output length information, which is always nice, and it can unlock some performance optimizations.
An RNN layer that returns only its last output step
num_features = 14
steps = 120
inputs = keras.Input(shape=(steps, num_features))
outputs = layers.SimpleRNN(16, return_sequences = False)(inputs)
This one only returns the output at the last timestep
An RNN layer that returns its full output sequence
num_features = 14
steps = 120
inputs = keras.Input(shape=(steps, num_features))
outputs = layers.SimpleRNN(16, return_sequences = True)(inputs)
Sometimes it’s useful to stack several recurrent layers 1 after the other in order to increase the representational power of a network. In a setup like this, you have to get all of the intermediate layers to return a full sequence of outputs.
Stacking RNN layers
inputs = keras.Input(shape = (steps, num_features))
x = layers.SimpleRNN(16, return_sequences = True)(inputs)
x = layers.SimpleRNN(16, return_sequences = True)(x)
outputs = layers.SimpleRNN(16)(x)
In the real world, you’ll rarely work with the SimpleRNN layer. It’s usually too simplistic to be of real use. In particular, SimpleRNN has a major issue: although it should be able to retain a time (t) information about inputs seen many timesteps before…. such long-term dependencies prove impossible to learn in practice. This is due to the vanishing gradient problem.
We’ll talk about it more in the next post
RNN on our temperature problem
Now let’s apply a basic RNN on our temperature problem from the last post and see how it holds up
inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = layers.LSTM(16)(inputs)
outputs = layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
Here is the model summary:
and, here’s the MAE it came up with:
2.54372239112854
Remember, the feedforward neural network had a MAE of: 3.79, and the ConvNet had a MAE of: 3.02
So voila, RNNs great at time series problems.