Recurrent Neural Networks

Understanding RNNs, A dummy RNN, Different RNN layers

Jul 29, 2025

A major characteristic of all densely & convolutional neural networks we’ve worked with so far is that they have no memory. Each input shown to them is processed independently, with no state kept between inputs.

With networks like these (feedforward & Conv), in order to process a sequence or a temporal series of data points, you have to show the entire sequence to the network at once: turn it into a single data point, aka flatten it.

In contrast… as you are reading this specific present sentence, you are processing it word by word, while keeping memories of what came before, this gives you a fluid representation of the meaning conveyed in this sentence (sort of like a sequence).

Understanding RNNs

Human intelligence processes information incrementally while maintaining an internal model of what it’s processing, built from past information & constantly updated as new information comes in.

A Recurrent Neural Network (RNN) adopts the same principle, albeit in an extremely simplified version: it processes sequences by iterating through the sequence elements and maintaining a state that contains information relative to what it has seen so far. In effect, an RNN is a type of neural network that has an internal loop.

The state of the RNN is reset between processing 2 different independent sequences, so you still consider 1 sequence to be a single data point: a single input to the network. What changes is that this data point is no longer processed in a single step; rather the network internally loops over sequence elements.

Let’s go ahead and implement a simple dummy RNN to understand this

A Dummy RNN

Our dummy RNN needs a starting point.
Let’s say we say the state at time (t) = 0

state_t = 0

With our starting point established, we will want it to iterate & do something over a sequence

for input_t in input_sequence:

For each of the iterations, the previous output becomes the state for the next iteration: output(t) = function(t, output(t-1))

output_t = f(input_t, state_t)
state_t = output(t)

f in this case is literally just a function that does something.

Different RNN Layers

Here are some different RNN layers, and a quick summary of what they do. For this example, we’ll say the number of features = 14, and the model outputs a single 16 dimensional vector summarizing the entire input sequence.

An RNN layer that can process sequences of any length

num_features = 14
inputs = keras.Input(shape=(None, num_features))
outputs = layers.SimpleRNN(16)(inputs)

This is super useful if your model is meant to process sequences of variable length. However, if all of your sequences have the same length, I recommend specifying a complete input shape, since it enables model.summary() to display output length information, which is always nice, and it can unlock some performance optimizations.

An RNN layer that returns only its last output step

num_features = 14
steps = 120
inputs = keras.Input(shape=(steps, num_features))
outputs = layers.SimpleRNN(16, return_sequences = False)(inputs)

This one only returns the output at the last timestep

An RNN layer that returns its full output sequence

num_features = 14
steps = 120
inputs = keras.Input(shape=(steps, num_features))
outputs = layers.SimpleRNN(16, return_sequences = True)(inputs)

Sometimes it’s useful to stack several recurrent layers 1 after the other in order to increase the representational power of a network. In a setup like this, you have to get all of the intermediate layers to return a full sequence of outputs.

Stacking RNN layers

inputs = keras.Input(shape = (steps, num_features))
x = layers.SimpleRNN(16, return_sequences = True)(inputs)
x = layers.SimpleRNN(16, return_sequences = True)(x)
outputs = layers.SimpleRNN(16)(x)

In the real world, you’ll rarely work with the SimpleRNN layer. It’s usually too simplistic to be of real use. In particular, SimpleRNN has a major issue: although it should be able to retain a time (t) information about inputs seen many timesteps before…. such long-term dependencies prove impossible to learn in practice. This is due to the vanishing gradient problem.

We’ll talk about it more in the next post

RNN on our temperature problem

Now let’s apply a basic RNN on our temperature problem from the last post and see how it holds up

inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = layers.LSTM(16)(inputs)
outputs = layers.Dense(1)(x)
model = keras.Model(inputs, outputs)

Here is the model summary:

and, here’s the MAE it came up with:
2.54372239112854

Remember, the feedforward neural network had a MAE of: 3.79, and the ConvNet had a MAE of: 3.02

So voila, RNNs great at time series problems.

Data Science & Machine Learning 101

Discussion about this post