## lstm time series prediction in R

It turns out that deep learning, with all its power, can also be used for forecasting. Especially the **LSTM **(Long Short Term Memory) model, which proved to be useful while solving problems involving sequences with autocorrelation.

Long Short Term Memory networks are kind of *Recurrent Neural Networks (RNN)* that are capable of learning long-term dependencies. LSTM enables to persist long term states in addition to short term, which tradicional RNN’s have difficulty with. LSTMs are quite useful in time series prediction tasks involving autocorrelation, because of their ability to maintain state and recognize patterns over the length of the series.

Here I show how to implement forecasting LSTM model using R language.

Contents

**HOW TO**

LSTM model is available in the **keras** R package, which runs on top of the Tensorflow.

Before we start we need to install and load both of those:

1 2 3 4 |
library(keras) library(tensorflow) install_keras() install_tensorflow(version = "nightly") |

**data** **preparation**

For the purpose of this example I used the *economics *dataset, which can be found in the **ggplot2** package. I want to predict the unemployment in the following 12 months (*unemploy *column).

1 2 |
library(ggplot2) head(economics) |

LSTM model requires to rescale the input data. Mean and standard deviation of the training dataset can be used as the scaling coefficients to scale both the training and testing data sets as well as the predicted values. This way we ensure that the scaling does not impact the model.

1 |
scale_factors <- c(mean(economics$unemploy), sd(economics$unemploy)) |

If you wish to train-test the model, you should start with data split. As I focused on creating the prediction, not accuracy of the model itself, I used full dataset for training.

1 2 3 |
scaled_train <- economics %>% dplyr::select(unemploy) %>% dplyr::mutate(unemploy = (unemploy - scale_factors[1]) / scale_factors[2]) |

LSTM algorithm creates predictions based on the *lagged* values. That means that it needs to look back to as many previous values as many points we wish to predict. As we want to do a 12 months forecast, we need to base each prediction on 12 data points.

For demonstration purpose let’s say our series has 10 data points [1, 2, 3, …, 10] and we want to predict 3 values. Then our training data looks like this:

1 2 3 4 5 |
[1, 2, 3] -> [4, 5, 6] [2, 3, 4] -> [5, 6, 7] [3, 4, 5] -> [6, 7, 8] [4, 5, 6] -> [7, 8, 9] [5, 6, 7] -> [8, 9, 10] |

Predictors and target take form of

1 2 3 |
3 4 5 6 7 6 7 8 9 10 X = [ 2 3 4 5 6 ] Y = [ 5 6 7 8 9 ] 1 2 3 4 5 4 5 6 7 8 |

Additionally keras LSTM expects specific tensor format of shape of a 3D array of the form [*samples, timesteps, features*] for predictors (X) and for target (Y) values:

*samples*specifies the number of observations which will be processed in batches.*timesteps*tells us the number of time steps (lags). Or in other words how many units back in time we want our network to see.*features*specifies number of predictors (*1*for univariate series and*n*for multivariate).

In case of predictors that translates to an array of dimensions: *(nrow(data) – lag – prediction + 1, 12, 1)*, where lag = prediction = 12.

1 2 |
prediction <- 12 lag <- prediction |

We lag the data 11 times, so that each prediction is based on 12 values, and arrange lagged values into columns. Then we transform it into the desired 3D form.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
scaled_train <- as.matrix(scaled_train) # we lag the data 11 times and arrange that into columns x_train_data <- t(sapply( 1:(length(scaled_train) - lag - prediction + 1), function(x) scaled_train[x:(x + lag - 1), 1] )) # now we transform it into 3D form x_train_arr <- array( data = as.numeric(unlist(x_train_data)), dim = c( nrow(x_train_data), lag, 1 ) ) |

Basically every column is a lagged version of the previous one – the last one is lagged by 11 steps comparing to the first one.

And final version:

Data was turned into a 3D array. As we have only one predictor, last dimension equals to one.

Now we apply similar transformation for the Y values.

1 2 3 4 5 6 7 8 9 10 11 12 13 |
y_train_data <- t(sapply( (1 + lag):(length(scaled_train) - prediction + 1), function(x) scaled_train[x:(x + prediction - 1)] )) y_train_arr <- array( data = as.numeric(unlist(y_train_data)), dim = c( nrow(y_train_data), prediction, 1 ) ) |

In the same manner we need to prepare input data for the prediction, which are in fact last 12 observations from our training set.

1 |
x_test <- economics$unemploy[(nrow(scaled_train) - prediction + 1):nrow(scaled_train)] |

We need to scale and transform it.

1 2 3 4 5 6 7 8 9 10 11 12 |
# scale the data with same scaling factors as for training x_test_scaled <- (x_test - scale_factors[1]) / scale_factors[2] # this time our array just has one sample, as we intend to perform one 12-months prediction x_pred_arr <- array( data = x_test_scaled, dim = c( 1, lag, 1 ) ) |

**lstm prediction**

We can build a LSTM model using the **keras_model_sequential** function and adding layers on top of that. The first LSTM layer takes the required input shape, which is the* [samples, timesteps, features]*. We set for both layers **return_sequences = TRUE** and **stateful = TRUE**. The second layer is the same with the exception of **batch_input_shape**, which only needs to be specified in the first layer.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
lstm_model <- keras_model_sequential() lstm_model %>% layer_lstm(units = 50, # size of the layer batch_input_shape = c(1, 12, 1), # batch size, timesteps, features return_sequences = TRUE, stateful = TRUE) %>% # fraction of the units to drop for the linear transformation of the inputs layer_dropout(rate = 0.5) %>% layer_lstm(units = 50, return_sequences = TRUE, stateful = TRUE) %>% layer_dropout(rate = 0.5) %>% time_distributed(keras::layer_dense(units = 1)) |

You can also decide to try out different activation functions with *activation *parameter (hyperbolic tangent *tanh *is the default one).

Also choose loss function for the optimization, type of optimizer and metric for assessing the model performance. Info about different optimizers can he found here.

1 2 3 4 |
lstm_model %>% compile(loss = 'mae', optimizer = 'adam', metrics = 'accuracy') summary(lstm_model) |

Next, we can fit our stateful LSTM. We set **shuffle = FALSE** to preserve sequences of time series.

1 2 3 4 5 6 7 8 |
lstm_model %>% fit( x = x_train_arr, y = y_train_arr, batch_size = 1, epochs = 20, verbose = 0, shuffle = FALSE ) |

And perform the prediction:

1 2 3 4 5 6 |
lstm_forecast <- lstm_model %>% predict(x_pred_arr, batch_size = 1) %>% .[, , 1] # we need to rescale the data to restore the original values lstm_forecast <- lstm_forecast * scale_factors[2] + scale_factors[1] |

LSTM model is more tricky than regular time series models, because you do not pass the explicit number of prediction points for the forecast. Instead you need to design your model in a way that it forecasts the desired number of periods. If you wish to predict more, you need to provide additional columns in your prediction set, containing values predicted for the previous periods. Following this example, predicting 13th month ahead requires “knowing” result for the 1st month ahead. The other option is to rebuild the model to predict 13 values instead of 12.

**forecast object**

As we have the values predicted, we can turn the results into the **forecast** object, as we would get if using the *forecast* package. That will allow i.e. to use the *forecast*::*autoplot* function to plot the results of the prediction. In order to do so, we need to define several objects that build a **forecast** object.

**prediction on a train set**

1 2 |
fitted <- predict(lstm_model, x_train_arr, batch_size = 1) %>% .[, , 1] |

Prediction on a training set will provide us with 12 results for each input period. So we need to transform the data to get only one prediction per each date.

1 2 3 4 5 6 7 8 9 |
if (dim(fitted)[2] > 1) { fit <- c(fitted[, 1], fitted[dim(fitted)[1], 2:dim(fitted)[2]]) } else { fit <- fitted[, 1] } # additionally we need to rescale the data fitted <- fit * scale_factors[2] + scale_factors[1] nrow(fitted) # 562 |

Due to the fact that our forecast starts with 12 months offset, we need to provide artificial (or real) values for those months:

1 2 |
# I specify first forecast values as not available fitted <- c(rep(NA, lag), fitted) |

**prediction in a form of ts object**

We need to change the predicted values into a time series object.

1 2 3 4 5 |
lstm_forecast <- timetk::tk_ts(lstm_forecast, start = c(2015, 5), end = c(2016, 4), frequency = 12 ) |

**input series**

Additionally we need to transform the economics data into a time series object.

1 2 3 4 |
input_ts <- timetk::tk_ts(economics$unemploy, start = c(1967, 7), end = c(2015, 4), frequency = 12) |

**forecast object**

Finally we can define the forecast object:

1 2 3 4 5 6 7 8 9 10 |
forecast_list <- list( model = NULL, method = "LSTM", mean = lstm_forecast, x = input_ts, fitted = fitted, residuals = as.numeric(input_ts) - as.numeric(fitted) ) class(forecast_list) <- "forecast" |

Now we can easily plot the data:

1 |
forecast::autoplot(forecast_list) |

**lstm prediction with regressors**

Handling regressors in LSTM goes down to treating the series as multivariate instead of univariate. Let’s create some random regressors for this example:

1 |
reg <- 100 * runif(nrow(economics)) |

As with training set, we need to scale the regressors as well

1 2 3 4 5 6 7 8 9 10 |
scale_factors_reg <- list( mean = mean(reg), sd = sd(reg) ) scaled_reg <- (reg - scale_factors_reg$mean)/scale_factors_reg$sd # additionally 12 values for the forecast scaled_reg_prediction <- (reg[(length(reg) -12): length(reg)] - scale_factors_reg$mean) /scale_factors_reg$sd |

Now we can add them to the training data and transform to tensors as previously.

1 2 3 4 5 6 7 8 9 10 11 |
# combine training data with regressors x_train <- cbind(scaled_train, scaled_reg) x_train_data <- list() # transform the data into lagged columns for (i in 1:ncol(x_train)) { x_train_data[[i]] <- t(sapply( 1:(length(x_train[, i]) - lag - prediction + 1), function(x) x_train[x:(x + lag - 1), i] )) } |

This time we end up with 2 records in our list – input data in the exact same for as previously and regressors following the same logic.

Again we transform this list into a 3D array.

1 2 3 4 5 6 7 8 |
x_train_arr <- array( data = as.numeric(unlist(x_train_data)), dim = c( nrow(x_train_data[[1]]), lag, 2 ) ) |

We also need to modify the prediction data to include regressors in the same manner as for training.

1 2 3 4 5 6 7 8 9 10 11 12 |
# combine the data with regressors x_test_data <- c(x_test_scaled, scaled_reg_prediction) # transform to tensor x_pred_arr <- array( data = x_test_data, dim = c( 1, lag, 2 ) ) |

Rest of the modeling stays the same.

Many thanks for this article.

I am comfortable with R. Python’s system management and version compatibilities unsurmountable for many R users.

It is a lot easier to install TF and keras as root user as installing and configuring for non-admin user. As root user, everything ran on the first go.

I have got a project on sparse time series project and I am/was determined to stick to R whatever it takes, and your article was perfect in every manner.

Just few things I have to guess to instal forecast and timetk.

Thanks once again and really obliged.

Good evening!

In the section “lstm prediction with regressors”, where it says # combine training data with regressors

x_train <- cbind(scaled_train, scaled_reg)

x_train_data <- list()

Which ones are the vectors "scaled_train " and "scaled_reg"?

Thanks a lot for your help.

Hi,

“scaled_train” object you have in “data preparation” section, but generally it is a standardized time series.

scaled_train <- economics %>%

dplyr::select(unemploy) %>%

dplyr::mutate(unemploy = (unemploy – scale_factors[1]) / scale_factors[2])

“scaled_reg” is described in “lstm prediction with regressors”. Here again we have a standardize regressor for the prediction.

I’ve donde a lot of search to find tutorials for apply lstm in ts forecast, your is the best one.

Very helpful. I’ll be following your work.

Many thanks for this.

I’m trying to reproduce your work here, but am running into the following:

> lstm_model <- keras_model_sequential()

*** caught illegal operation ***

address 0x187ecf874, cause 'illegal trap'

Traceback:

1: py_module_import(module, convert = convert)

2: import(module)

3: doTryCatch(return(expr), name, parentenv, handler)

4: tryCatchOne(expr, names, parentenv, handlers[[1L]])

5: tryCatchList(expr, classes, parentenv, handlers)

6: tryCatch(import(module), error = clear_error_handler())

7: py_resolve_module_proxy(x)

8:

`$.python.builtin.module`

(keras, "models")9: keras$models

10: keras_model_sequential()

Possible actions:

1: abort (with core dump, if enabled)

2: normal R exit

3: exit R without saving workspace

4: exit R saving workspace

Selection:

Any thoughts how to fix this? I'm not that familiar with Python unfortunately. Working on a MacBook Pro M1.

Thank you for sharing way too nice model!

I have a unsolved problem to develop a LSTM time series model.

I want to develop ‘multivariate LSTM prediction model’, but I don’t know how to coordinate the feature parameter to develop it…

And I don’t know how to prepare for multivariate datasets to apply into the model…

Can I get to know how to solve this problem?