Data forecasting is a process of estimating the future based on historical values. It is described by time series, which is simply a series of time dependent data points. We usually forecast different costs or sales over time. We can try to predict weather conditions or model stock changes. Basically look at any process that can be described as time dependent with certain time interval (hourly, daily, monthly…).
In general forecasting follows traditional machine learning processing steps:
- data cleansing and formatting
- establishing results
What is different here is that all our observations are associated with time. Based on that we can distinguish several forecasting features which are important for those types of analysis. Regardless the data or the forecasting problem we want to solve, there are several things to pay attention to before and during actual modelling. I describe those below.
Here I describe features which I consider important for every forecasting project. Deciding on them before hand speeds up the modelling and saves you time potentially spend on redesigning the concept.
Decide what you want to predict
May sound obvious but it not always is. Classical time series prediction is purely based on history. We look at historical values and model what the next observation is going to be. It means we’re using one data source for our prediction. The thing is we may try to use more as so called external regressors (predictors). External regressors are some other data sources, also time dependent, but which we do not model by itself. Instead we use them to improve prediction on our main dataset. Those can be changes of the law, results of holidays, currency rate changes and so on.
- prediction without regressors:
Considering daily temperature changes in New York.
- prediction with regressors:
Looking daily at air pollution indicators in New York. During holidays there is less traffic in the city, which impacts air quality. (Holiday dates are regressors in this case).
It is good to decide before starting modelling if we’re going to use external predictors or not. It is because we use different modelling techniques for both cases. Some models simply do not take external regressors. So you cannot just add them later. That would require most likely completely changing the modelling strategy. There are some models (like Arima) that could be used with and without regressors, but that is not the general rule. Knowing upfront if we’ll use them or not allows to select proper modelling strategy.
- Some models which use regressors: Arima, Lstm, Bsts, Prophet
- Some models which do not use regressors: Ets, Tbats, Holt-Winters
Dataset we work with usually has certain time interval specified. We may have daily, weekly, monthly, quarterly, … results. In most cases we then predict results on the same time aggregation level. But this is not a must. We may decide to have more aggregated predictions than the data itself. For example historical data may be preserved on monthly level but we expect to have yearly predictions. For such case there are few approaches we can take.
We can simply aggregate our data to get yearly values and perform the prediction. Problem with that may be that after aggregations we end up with too few data points to build a forecasting model. Though we still may try to do regression modelling instead.
We may perform regular monthly forecast and then aggregate the results to yearly values. But that imposes a risk of aggregating uncertainty that comes with our forecasts. Basically we cannot tell what is the proper confidence level for such a forecast.
There is one more possibility and it seems best from my perspective. It was described by Rob J. Hyndman and you can find it here. Example is written in R, but can easily be also transferred to Python or some other language. In principle we first preaggregate the data replacing current values with sums of that many previous values as we have in aggregated time (so 12 last months for yearly results, 24 last hours for daily results and so on). On such preaggregates we then perform the modelling and forecasting. In the end we just need to select proper results as our forecasts of interest (every 12th forecast would hold actual yearly values). This method allows us to easily determine confidence intervals of our predictions and still gives us flexibility of forecasting modelling.
Regardless the aggregation level we choose it makes sense to decide upon it upfront, before spending time on modelling. One thing to notice is that the more aggregates we do, the smoother forecast we get.
You can check my example of yearly forecasts in R in my previous post.
Choose data format
Forecasting dataset always consists of time variable and some time dependent values that we want to predict. Usually we also have some so called features – columns describing the dataset, like country, city, product or company. Those are certain characteristics of the data, based on which our data is split. Then those splits determine how many different time series we actually have (one per each split). Each of those requires separate modelling and we may treat them independently. Or we may also benefit from data nesting.
By data nesting I mean combining all time-dependent values into one structure per each category split.
Let’s say we only have one feature for sake of simplicity – country, and sample of our data looks like this:
After nesting we end up with just one row per each country:
where each nested_structure follows the same pattern:
Using such nested format proved to be very useful and efficient for forecasting. You can check how to do data nesting in R in my previous post. You can use tidyr package for that.
Choose level of aggregation
We may have more than one data feature in our dataset. Moreover they may create a hierarchy (like region -> subregion -> country). In such case it is good to perform forecasting in such a way that forecasts on low aggregation levels add up to ones on more aggregated levels. We call it hierarchical forecasting, meaning that time series are linked together in a hierarchical structure. We can also perform group forecasting, where results of certain subgroups add up to result of full population.
There are two main methods of such forecasting: top-down and bottom-up. Mixing those two methods gives us the middle-out version.
In top-down we start with forecasts on aggregates and then disaggregate based on forecast proportions. Those proportions may be tricky to obtain though (we can take i.e. averages of historical proportions of series or historical averages of proportions).
In bottom-up we do the opposite, we start with low level forecasts and then aggregate them to upper levels of hierarchy. Due to such aggregation no information is lost. On the other hand bottom level data can be quite noisy and more challenging to model.
Middle-out starts in some middle level and goes in both directions using methods mentioned above.
Additionally we can use more general approach – we obtain results for each level of hierarchy using some modelling technique and then we adjust them to align with hierarchical assumptions. So in a nutshell we “fix” all the forecasting results. Such fixing can be done as a next step in our pipeline, after we tackle all levels independently. However deciding on that upfront allows us to properly structure such a pipeline from the very start.
For time series aggregates in R you can use hts package.
There are different techniques of forecasting. This can be very popular Arima (autoregressive integrated moving average), exponential smoothing method (like Holts-Winter or Ets), Tbats (trigonometric seasonal, Box-Cox transformation, Arma residuals, trend and seasonality model) or some other. There are quite a lot to choose from. Initial subset is determined by deciding whether we use external regressors or not, but still we have plenty to check.
Depending on the use case some models perform better than the others. But sometimes some of them may actually give us similarly good results for the same data. It also happens that some perform better on some part of the data and others on different part. Good news is that we don’t have to stick to just one modelling technique – we can build a hybrid. Hybrid is a model which we create by combining results of other models. Such hybrids may give us better results than separate models as they can leverage different modelling techniques and combine them.
For building a model hybrid we may use as many models as we want, as long as we weight the results properly. Most typically we either use equal weights or weights inversely proportional to each models errors.
Instead of using just one modelling technique, we can use let’s say three. By doing so we obtain three results for each time series. Then we weight the results to get the final prediction. As our confidence interval we should take the widest one.
Nice thing about hybrids is that you may start with just one model and then add some more as you go. But it is good to leave yourself such an option.
Here you can find an example of how to build hybrids in R. You can also check my example code. forecastHybrid package is very useful here.
Choose forecast period
The further in the future we want to perform the forecast, the less accurate results we will get. Of course the more clear data trend and seasonality of results, the better, but in real live cases we cannot trust very future predictions that much. We live in constantly changing world and usually current patterns slightly (or sometimes greatly) differ from the historical ones. That is why we should choose the period of forecast wisely. Definitely it should not exceed number of historical data points we have to base our model on.
There is no strict rule defining how many is optimal, because it very much depends on the use case. For daily data we could assume more forecasted results than for yearly ones as we can expect higher fluctuations over a year than even a week. Generally the shorter the period the more certain the result (and higher confidence level). Best idea for choosing optimal number of periods is to validate it using cross-validation methods, for particular case.