Skip to content

xgboost time series forecast in R

xgboost time series forecast in R

xgboost, or Extreme Gradient Boosting is a very convenient algorithm that can be used to solve regression and classification problems. You can check may previous post to learn more about it. It turns out we can also benefit from xgboost while doing time series predictions.

Example is in R language.




As I will be using xgboost and caret R packages, I need my data to be provided in a form of a dataframe. Let’s use economics dataset from ggplot2 package.

There are several time related columns, but for this time series forecast I will use one – unemploy.

Now I will generate index values for my future forecast. Let that be a 12 months prediction.

Now my extended dataframe already has the dates specified for the forecast, with no values assigned.

Now we need to take care of the date column. xgboost does not tackle date columns well, so we need to split it into several columns, describing the granularity of the time. In this case months and years:

Now we can split the data into training set and prediction set:

In order to use xgboost we need to transform the data into a matrix form and extract the target variable. Additionally we need to get rid of the dates columns and just use the newly created ones:


xgboost prediction

With data prepared as in a previous section we can perform the modeling in the same manner as if we were not dealing with the time series data. We need to provide the space of parameters for model tweaking. We specify the cross-validation method with number of folds and also enable parallel computations.

Now we can build the model using the tree models:

Let’s check the best values that were chosen as hyperparameters:

And perform the forecast:


forecast object

As we have the values predicted, we can turn the results into the forecast object, as we would get if using the forecast package. That will allow i.e. to use the forecast::autoplot function to plot the results of the prediction. In order to do so, we need to define several objects that build a forecast object.

Now we can easily plot the data


xgboost forecast with regressors

Nice thing about forecasting with xgboost is that we can use different regressors with our time series easily. To do so, we just need to extend the xgboost data matrices properly.

Of course we need to make sure that the dates of our predictors are aligned with the initial time series dataset.

11 thoughts on “xgboost time series forecast in R

  • Looking into predicting stocks. I have tried out the prophet package for time series which is good. Had no luck with adding regressors. This example is definitely helpful. Do you have any other suggestions on predicting time series with regressors? I know you mentioned arima, but wondering if you had other suggestions.

    • in principle yes, depends on data granularity and of course the more points the more sensible prediction may be expected.

  • Is there an easy way to convert this solution to be working on weekly data (or even daily) ??
    I have tried changing the frequency of the ts objects (created in the fitted and xgb_forecast variables) to a weekly frequency or even tried to convert the xts objects to a ts without considering a freq but had no luck…

    • Yes! The key is to specify your artificial time related columns properly. If daily data – daily granularity, if weekly data – weekly granularity ect.

      For some daily dataset:
      extended_data_mod <- extended_data %>%
      days = lubridate::day(Date),
      months = lubridate::month(Date),
      years = lubridate::year(Date))

      Then for the fitted values and prediction you need to pass daily index:
      fitted <- xgb_model %>%
      stats::predict(x_train) %>%
      stats::ts(start = c(lubridate::year(min(train$Date)), lubridate::yday(min(train$Date))),
      end = c(lubridate::year(max(train$Date)), lubridate::yday(max(train$Date))),
      frequency = 365)

      xgb_forecast <- xgb_pred %>%
      stats::ts(start = c(lubridate::year(min(pred$Date)), lubridate::yday(min(pred$Date))),
      end = c(lubridate::year(max(pred$Date)), lubridate::yday(max(pred$Date))),
      frequency = 365)

      Similarly for any other data granularity.

  • xgb_model <- caret::train(
    x_train, y_train,
    trControl = xgb_trcontrol,
    tuneGrid = xgb_grid,
    method = "xgbTree",
    nthread = 1
    For this line it gives the following error :
    Error in [.xgb.DMatrix(x, 0, , drop = FALSE) :
    unused argument (drop = FALSE)

  • I have an error every time in this part:
    xgb_model = train(
    x = x_train,
    y = y_train,
    trControl = xgb_trcontrol,
    tuneGrid = xgb_grid,
    method = “xgbTree”,
    nthread = 1

    error: Error in [.xgb.DMatrix(x, 0, , drop = FALSE) :
    unused argument (drop = FALSE)

    Maybe some had the same one?

Leave a Reply

Your email address will not be published. Required fields are marked *