Skip to content

xgboost time series forecast in R

xgboost time series forecast in R

xgboost, or Extreme Gradient Boosting is a very convenient algorithm that can be used to solve regression and classification problems. You can check may previous post to learn more about it. It turns out we can also benefit from xgboost while doing time series predictions.

Example is in R language.

Contents

HOW TO

 

data

As I will be using xgboost and caret R packages, I need my data to be provided in a form of a dataframe. Let’s use economics dataset from ggplot2 package.

There are several time related columns, but for this time series forecast I will use one – unemploy.

Now I will generate index values for my future forecast. Let that be a 12 months prediction.

Now my extended dataframe already has the dates specified for the forecast, with no values assigned.

Now we need to take care of the date column. xgboost does not tackle date columns well, so we need to split it into several columns, describing the granularity of the time. In this case months and years:

Now we can split the data into training set and prediction set:

In order to use xgboost we need to transform the data into a matrix form and extract the target variable. Additionally we need to get rid of the dates columns and just use the newly created ones:

 

xgboost prediction

With data prepared as in a previous section we can perform the modeling in the same manner as if we were not dealing with the time series data. We need to provide the space of parameters for model tweaking. We specify the cross-validation method with number of folds and also enable parallel computations.

Now we can build the model using the tree models:

Let’s check the best values that were chosen as hyperparameters:

And perform the forecast:

 

forecast object

As we have the values predicted, we can turn the results into the forecast object, as we would get if using the forecast package. That will allow i.e. to use the forecast::autoplot function to plot the results of the prediction. In order to do so, we need to define several objects that build a forecast object.

Now we can easily plot the data

 

xgboost forecast with regressors

Nice thing about forecasting with xgboost is that we can use different regressors with our time series easily. To do so, we just need to extend the xgboost data matrices properly.

Of course we need to make sure that the dates of our predictors are aligned with the initial time series dataset.

24 thoughts on “xgboost time series forecast in R

  • Looking into predicting stocks. I have tried out the prophet package for time series which is good. Had no luck with adding regressors. This example is definitely helpful. Do you have any other suggestions on predicting time series with regressors? I know you mentioned arima, but wondering if you had other suggestions.

    • in principle yes, depends on data granularity and of course the more points the more sensible prediction may be expected.

  • Is there an easy way to convert this solution to be working on weekly data (or even daily) ??
    I have tried changing the frequency of the ts objects (created in the fitted and xgb_forecast variables) to a weekly frequency or even tried to convert the xts objects to a ts without considering a freq but had no luck…

    • Yes! The key is to specify your artificial time related columns properly. If daily data – daily granularity, if weekly data – weekly granularity ect.

      For some daily dataset:
      extended_data_mod <- extended_data %>%
      dplyr::mutate(.,
      days = lubridate::day(Date),
      months = lubridate::month(Date),
      years = lubridate::year(Date))

      Then for the fitted values and prediction you need to pass daily index:
      fitted <- xgb_model %>%
      stats::predict(x_train) %>%
      stats::ts(start = c(lubridate::year(min(train$Date)), lubridate::yday(min(train$Date))),
      end = c(lubridate::year(max(train$Date)), lubridate::yday(max(train$Date))),
      frequency = 365)

      xgb_forecast <- xgb_pred %>%
      stats::ts(start = c(lubridate::year(min(pred$Date)), lubridate::yday(min(pred$Date))),
      end = c(lubridate::year(max(pred$Date)), lubridate::yday(max(pred$Date))),
      frequency = 365)

      Similarly for any other data granularity.

  • xgb_model <- caret::train(
    x_train, y_train,
    trControl = xgb_trcontrol,
    tuneGrid = xgb_grid,
    method = "xgbTree",
    nthread = 1
    )
    For this line it gives the following error :
    Error in [.xgb.DMatrix(x, 0, , drop = FALSE) :
    unused argument (drop = FALSE)

  • I have an error every time in this part:
    xgb_model = train(
    x = x_train,
    y = y_train,
    trControl = xgb_trcontrol,
    tuneGrid = xgb_grid,
    method = “xgbTree”,
    nthread = 1
    )

    error: Error in [.xgb.DMatrix(x, 0, , drop = FALSE) :
    unused argument (drop = FALSE)

    Maybe some had the same one?

  • xgb_model <- caret::train(
    x_train, y_train,
    trControl = xgb_trcontrol,
    tuneGrid = xgb_grid,
    method = "xgbTree",
    nthread = 1
    )
    For this line it gives also the following error :
    Error in [.xgb.DMatrix(x, 0, , drop = FALSE) :
    unused argument (drop = FALSE)

  • Why is the predicted value of the test set the same as the fitting value of the last year of the training set? Is there a solution?

  • I have downgraded caret, but it could not work for me:
    xgb_pred % stats::predict(x_pred)
    > xgb_pred % stats::predict(x_pred)
    Error in [.xgb.DMatrix(newdata, , colnames(newdata) %in% object$finalModel$xNames, :
    unused argument (drop = FALSE)

  • Thanks for your tutorial. For the record, the xgb.DMatrix error can be avoided by removing the xgb.DMatrix conversion. instead, simply convert to classic R matrix : as.matrix(pred %>% dplyr::select(months, years)).
    Only need to do so for train and pred variables and everything will work fine.

    • Muchas gracias jons2580, llevaba mas de 2 meses intentando solucionar el error
      Error en [.xgb.DMatrix(x, 0, , drop = FALSE):
      argumento no utilizado (drop = FALSE)
      y con tu indicación logré superar ese error y seguir adelante con el proyecto.

      Una recomendación, si bien es cierto que podemos reemplazar la matriz de xgboost (xgb.Dmatrix) con la matriz tradicional (as.matrix), es importante mantenarla de la siguiente manera

      test_Dmatrix %
      dplyr::select(months,years) %>%
      as.matrix(pred %>% dplyr::select(months, years)) %>%
      xgb.DMatrix()

      Mil gracias de nuevo

  • Hello
    I got this error:
    WARNING: src/c_api/c_api.cc:935: ntree_limit is deprecated, use iteration_range instead.
    How can I solve it?
    Thank you

  • Hi

    Does the model with regressors expect that we have the data of regressors available for the future? intuitively it seems so but am I missing something?

  • Hello,is this an irregular time series? Date ranges don’t look the same? If so, does this mean that XGBoost can also be used on irregular time series?

Leave a Reply to Mukhtar A. Yusuf Cancel reply

Your email address will not be published. Required fields are marked *