# xgboost time series forecast in R xgboost time series forecast in R 12/04/2020 11:32 AM Alice Tags: Forecasting, R, Xgb 10 xgboost, or Extreme Gradient Boosting is a very convenient algorithm that can be used to solve regression and classification problems. You can check may previous post to learn more about it. It turns out we can also benefit from xgboost while doing time series predictions. Example is in R language. Contents1 HOW TO1.0.1 data 1.0.2 xgboost prediction1.0.3 forecast object1.0.4 xgboost forecast with regressors HOW TO   data As I will be using xgboost and caret R packages, I need my data to be provided in a form of a dataframe. Let’s use economics dataset from ggplot2 package. There are several time related columns, but for this time series forecast I will use one – unemploy. data <- economics %>% dplyr::select(date, unemploy) 1 data <- economics %>% dplyr::select(date, unemploy) Now I will generate index values for my future forecast. Let that be a 12 months prediction. extended_data <- data %>% rbind(tibble::tibble(date = seq(start = lubridate::as_date("2015-05-01"), by = "month", length.out = 12), unemploy = rep(NA, 12))) 1234 extended_data <- data %>%     rbind(tibble::tibble(date = seq(start = lubridate::as_date("2015-05-01"),                                    by = "month", length.out = 12),                          unemploy = rep(NA, 12))) Now my extended dataframe already has the dates specified for the forecast, with no values assigned. Now we need to take care of the date column. xgboost does not tackle date columns well, so we need to split it into several columns, describing the granularity of the time. In this case months and years: extended_data_mod <- extended_data %>% dplyr::mutate(., months = lubridate::month(date), years = lubridate::year(date)) 1234 extended_data_mod <- extended_data %>%    dplyr::mutate(.,                   months = lubridate::month(date),                  years = lubridate::year(date)) Now we can split the data into training set and prediction set: train <- extended_data_mod[1:nrow(data), ] # initial data pred <- extended_data_mod[(nrow(data) + 1):nrow(extended_data), ] # extended time index 123 train <- extended_data_mod[1:nrow(data), ] # initial data pred <- extended_data_mod[(nrow(data) + 1):nrow(extended_data), ] # extended time index In order to use xgboost we need to transform the data into a matrix form and extract the target variable. Additionally we need to get rid of the dates columns and just use the newly created ones: x_train <- xgboost::xgb.DMatrix(as.matrix(train %>% dplyr::select(months, years))) x_pred <- xgboost::xgb.DMatrix(as.matrix(pred %>% dplyr::select(months, years))) y_train <- train\$unemploy 123456 x_train <- xgboost::xgb.DMatrix(as.matrix(train %>%                                dplyr::select(months, years)))x_pred <- xgboost::xgb.DMatrix(as.matrix(pred %>%                                 dplyr::select(months, years))) y_train <- train\$unemploy   xgboost prediction With data prepared as in a previous section we can perform the modeling in the same manner as if we were not dealing with the time series data. We need to provide the space of parameters for model tweaking. We specify the cross-validation method with number of folds and also enable parallel computations. xgb_trcontrol <- caret::trainControl( method = "cv", number = 5, allowParallel = TRUE, verboseIter = FALSE, returnData = FALSE ) xgb_grid <- base::expand.grid( list( nrounds = c(100, 200), max_depth = c(10, 15, 20), # maximum depth of a tree colsample_bytree = seq(0.5), # subsample ratio of columns when construction each tree eta = 0.1, # learning rate gamma = 0, # minimum loss reduction min_child_weight = 1, # minimum sum of instance weight (hessian) needed ina child subsample = 1 # subsample ratio of the training instances )) 123456789101112131415161718 xgb_trcontrol <- caret::trainControl(   method = "cv",    number = 5,   allowParallel = TRUE,    verboseIter = FALSE,    returnData = FALSE) xgb_grid <- base::expand.grid(   list(    nrounds = c(100, 200),    max_depth = c(10, 15, 20), # maximum depth of a tree    colsample_bytree = seq(0.5), # subsample ratio of columns when construction each tree    eta = 0.1, # learning rate    gamma = 0, # minimum loss reduction    min_child_weight = 1,  # minimum sum of instance weight (hessian) needed ina child    subsample = 1 # subsample ratio of the training instances)) Now we can build the model using the tree models: xgb_model <- caret::train( x_train, y_train, trControl = xgb_trcontrol, tuneGrid = xgb_grid, method = "xgbTree", nthread = 1 ) 1234567 xgb_model <- caret::train(   x_train, y_train,   trControl = xgb_trcontrol,   tuneGrid = xgb_grid,   method = "xgbTree",   nthread = 1) Let’s check the best values that were chosen as hyperparameters: xgb_model\$bestTune 1 xgb_model\$bestTune And perform the forecast: xgb_pred <- xgb_model %>% stats::predict(x_pred) 1 xgb_pred <- xgb_model %>% stats::predict(x_pred)   forecast object As we have the values predicted, we can turn the results into the forecast object, as we would get if using the forecast package. That will allow i.e. to use the forecast::autoplot function to plot the results of the prediction. In order to do so, we need to define several objects that build a forecast object. # prediction on a train set fitted <- xgb_model %>% stats::predict(x_train) %>% stats::ts(start = zoo::as.yearmon(min(train\$date)), end = zoo::as.yearmon(max(train\$date)), frequency = 12) 123456 # prediction on a train setfitted <- xgb_model %>%    stats::predict(x_train) %>%    stats::ts(start = zoo::as.yearmon(min(train\$date)),               end = zoo::as.yearmon(max(train\$date)),              frequency = 12) # prediction in a form of ts object xgb_forecast <- xgb_pred %>% stats::ts(start = zoo::as.yearmon(min(pred\$date)), end = zoo::as.yearmon(max(pred\$date)), frequency = 12) 12345 # prediction in a form of ts objectxgb_forecast <- xgb_pred %>%    stats::ts(start = zoo::as.yearmon(min(pred\$date)),              end = zoo::as.yearmon(max(pred\$date)),              frequency = 12) # original data as ts object ts <- y_train %>% stats::ts(start = zoo::as.yearmon(min(train\$date)), end = zoo::as.yearmon(max(train\$date)), frequency = 12) 12345 # original data as ts objectts <- y_train %>%     stats::ts(start = zoo::as.yearmon(min(train\$date)),               end = zoo::as.yearmon(max(train\$date)),               frequency = 12) # forecast object forecast_list <- list( model = xgb_model\$modelInfo, method = xgb_model\$method, mean = xgb_forecast, x = ts, fitted = fitted, residuals = as.numeric(ts) - as.numeric(fitted) ) class(forecast_list) <- "forecast" 12345678910 # forecast objectforecast_list <- list(   model = xgb_model\$modelInfo,   method = xgb_model\$method,   mean = xgb_forecast,   x = ts,    fitted = fitted,   residuals = as.numeric(ts) - as.numeric(fitted))class(forecast_list) <- "forecast" Now we can easily plot the data forecast::autoplot(forecast_list) 1 forecast::autoplot(forecast_list)   xgboost forecast with regressors Nice thing about forecasting with xgboost is that we can use different regressors with our time series easily. To do so, we just need to extend the xgboost data matrices properly. x_train <- xgboost::xgb.DMatrix(cbind( as.matrix(data %>% dplyr::select(months, years)), reg_train)) x_pred <- xgboost::xgb.DMatrix(as.matrix(pred %>% dplyr::select(months, years)), reg_pred)) y_train <- data\$value 12345678 x_train <- xgboost::xgb.DMatrix(cbind(   as.matrix(data %>% dplyr::select(months, years)),   reg_train))x_pred <- xgboost::xgb.DMatrix(as.matrix(pred %>%                            dplyr::select(months, years)),   reg_pred)) y_train <- data\$value Of course we need to make sure that the dates of our predictors are aligned with the initial time series dataset.

## 10 thoughts on “xgboost time series forecast in R”

• Nick says:

Looking into predicting stocks. I have tried out the prophet package for time series which is good. Had no luck with adding regressors. This example is definitely helpful. Do you have any other suggestions on predicting time series with regressors? I know you mentioned arima, but wondering if you had other suggestions.

• Joel says:

Why does the forecast period (blue) look like a copy of the last observed period (black)?

• Alice says:

Because the data was randomly generated. I changed the data to make example more meaningful.

• Abhinav says:

Does forecasting even if the data is less than 1 year into history?

• Alice says:

in principle yes, depends on data granularity and of course the more points the more sensible prediction may be expected.

• Panos says:

Is there an easy way to convert this solution to be working on weekly data (or even daily) ??
I have tried changing the frequency of the ts objects (created in the fitted and xgb_forecast variables) to a weekly frequency or even tried to convert the xts objects to a ts without considering a freq but had no luck…

• Alice says:

Yes! The key is to specify your artificial time related columns properly. If daily data – daily granularity, if weekly data – weekly granularity ect.

For some daily dataset:
extended_data_mod <- extended_data %>%
dplyr::mutate(.,
days = lubridate::day(Date),
months = lubridate::month(Date),
years = lubridate::year(Date))

Then for the fitted values and prediction you need to pass daily index:
fitted <- xgb_model %>%
stats::predict(x_train) %>%
stats::ts(start = c(lubridate::year(min(train\$Date)), lubridate::yday(min(train\$Date))),
end = c(lubridate::year(max(train\$Date)), lubridate::yday(max(train\$Date))),
frequency = 365)

xgb_forecast <- xgb_pred %>%
stats::ts(start = c(lubridate::year(min(pred\$Date)), lubridate::yday(min(pred\$Date))),
end = c(lubridate::year(max(pred\$Date)), lubridate::yday(max(pred\$Date))),
frequency = 365)

Similarly for any other data granularity.

• Shubham Karne says:

xgb_model <- caret::train(
x_train, y_train,
trControl = xgb_trcontrol,
tuneGrid = xgb_grid,
method = "xgbTree",
)
For this line it gives the following error :
Error in `[.xgb.DMatrix`(x, 0, , drop = FALSE) :
unused argument (drop = FALSE)

• Alice says:

that is an issue with caret package, introduced in some new version – try downgrading.

• Vladimir says:

I have an error every time in this part:
xgb_model = train(
x = x_train,
y = y_train,
trControl = xgb_trcontrol,
tuneGrid = xgb_grid,
method = “xgbTree”,
error: Error in `[.xgb.DMatrix`(x, 0, , drop = FALSE) :