Sales forecasting models can be categorized into deterministic vs stochastic. In a deterministic model, the outputs are based on the inputs. I.e. it is the outputs are “determined” by the inputs.

Stochastic models use an element of probability. The same inputs normally don’t yield exactly the same outputs. This is more in line with how the real world operates. There is always a random element to prediction.

One type of stochastic modelling known as Monte Carlo simulation is discussed in another post.

The purpose of this post is to look at different deterministic models in sales forecasting and a method of choosing the best of the lot.

Picking the right sales forecasting model is tricky. Every sales pattern is unique and has different characteristics. A model that fits well with past sales may not work so well predicting future sales.

Because of this, there is a generally accepted way of measuring the success of a model.

- Separate the data into two sets. Use one of the sets to build the model (training data set) and another to test its accuracy (testing data set).
- Compare how far the prediction is off the testing data set.

A popular formula for finding the difference between the training and testing data is known as the Root Mean Squared Error (RMSE). Since the differences are squared, the positives don’t cancel out the negatives. Finding the square root of the difference brings it back to the same unit being considered. So you can say the model is x units wrong. For the case of testing goodness of fit, a smaller RMSE is better than a bigger one.

The code, the data and the images are available in a Github repository.

## The Data

The dataset on the Tableau website containing superstore data fits the problem well. It is complex and variable enough to have seasonal elements, trends as well as random elements.

For reference it can be downloaded here.

The data is well organised and can be divided into categories of items.

Total daily sales look extremely volatile. It might be a good idea to aggregate them up weekly and look only at a specific category.

Total daily sales look extremely volatile. It might be a good idea to aggregate them up weekly and look only at a specific category.

## Forecast Methods

### Mean Forecast

The average of previous values. This is one of the most basic forecasting method available. It doesn’t capture trend or seasonality. A mean forecast has strength in it’s simplicity. It’s sometimes used with unpredictable data where you can’t make any prior assumptions about it.

### Naive Forecast

Another really basic forecast. The only difference to the Mean Forecast is what the fixed value is. A Naive forecast just gets the last actual value that was recorded.

If you moved house in the middle of the year, and you wanted to forecast your mortgage for the next year, a Naive forecast is better than a mean forecast.

Again compared to the actual values it looks like this method is too simple to capture what is happening with the data.

### Linear Trend Based on all History

A linear trend captures some of the variation but is only slightly better than the other methods. It captures the year on year growth but none of the seasonality.

### Linear Trend Based on Monthly Mean

The trend for the year is captured using each month’s averages. Compared with the annual mean it captures the variation that happens within the year but not the year on year trend. It performs a lot better meaning there is more variation within the year that within the years.

### Polynomial Trend Based on Monthly Mean

The same as above but trying to fit a curve to the data rather than a straight line. This is generally useful when there is no clear straight line fit. Like there is exponential changes to the data. Think of how Bitcoin grew or Amazon’s share price. A straight line will not capture that very well. In this case the error in the curve is more than the error with a straight line. So in this case a straight line will be a better fit.

The next set of models are more complex. They deal with more variables and try to capture cyclical and seasonal variables.

### Seasonal Naive Method

Average out each month’s data and assume this will be what the next period will be. Furniture sales has both a historical trend as well as an annual trend. Although this would capture the variation within the year, it doesn’t capture overall growth.

This method looks like a good fit to the data even though it doesn’t capture the overall growth.

### ARIMA

Auto Regressive Integrated Moving Average is a popular time series forecasting method, that captures both seasonality and trend. It looks like it is not doing any better than the Naive Seasonal method because it is throwing out less errors.

This brings up a point about selection criteria, where you should favour a simple model over a more complicated one.

### Holt Winters Forecasting Method

Another forecasting method like ARIMA with a slightly different approach. This is the best option available for the reports as the error rate is the lowest using this one.

This is just a small sample of the forecasting methods available. They can range to more specialized ones that involve machine learning algorithms. These will be dealt with in a future post.

Sales forecasting doesn’t only have to depend on one forecasting method. Forecasts can use an ensemble of different algorithms to understand different aspects of sales.

Forecasting is also subjective, as there are elements of judgement that go into the forecast. A good model accommodates all useful approaches and is able to separate out the purely statistical from the judgemental.

If you made it to the end of this post, thanks for reading. Please let me know what you think. It would be great if we could start a discussion on this topic.

### References

Some of the resources I used to create this post are below. There are all really well written.

[…] Some deterministic models used for sales forecasting have been discussed in a previous post. […]

[…] The Python programming language is a useful tool for this class of problem. A similar kind of problem has been solved in a previous post to model sales. […]

[…] a previous post we looked at trying out different forecasting models on your data. The idea was to try an wide […]

[…] Forecasting with deterministic models. […]

[…] For a more detailed analysis, go here. […]