## The Forecasting Project

A complete sales forecasting project involves multiple steps from gathering the data to implementing the forecast into the companies demand planning system. Each of these steps need be be organised in a way that makes most use of the data available. Any insight gained from looking at the data needs to be integrated into the forecast.

A forecasting project is also more than just the software. The forecast needs to be actionable and properly integrated into the demand planning system for the best ROI.

Here we look at each of the steps and what can be done for the best impact.

## 1. Get Data Ready for Analysis

Start with an audit of the data you have. This will audit the major data sources and what you expect to find from that data.

Getting data ready for analysis us usually the the longest and most involved part of a data analysis project. Starting from scratch you can expect this part of the project to take up to 80% of the total time.

Because of this, any time spent on speeding up the process can have a significant impact impact on the overall completion time of the project. It will also allow you to spend more time on the actual analysis of the data.

### Quantitative and Qualitative Data

In a forecasting project it is important to use all the information you have at hand when building the forecast. This means that the data side of things is only part of the overall information available. Business and expert knowledge needs to be included in the forecast.

The final forecast needs to clearly document why the forecast is and document all the information that went into building it.

Getting as much information as possible from your data is time well spent. Even a small new insight can yield significant opportunities to the business.

## 1.1. Getting at the Data

Start off by documenting the major data sources and what you know about the contents of each data source. This could include the opinions of subject matter experts in the business. It could also include information about future events that will affect the the forecast. This could include things like new product launches or marketing initiatives.

### 1.1.1. Data Extraction

Data sources are normally files and databases but nowadays this involved data from new sources.

- Files
- SQL
- Web Scraping
- APIs
- Public Data

**Automation**

Getting the data could be a one off process or a regular one. Depending on how often and how complicated extraction is, automated data extraction can be build in order to streamline this task.

### 1.1.2. Data Cleaning

At this point documentation of the data sources as well their data will be at hand. There could be further work to be done in improving the quality of the data for analysis. These are decisions to be made based on the type of data and what to do to get the most accurate representation of reality.

### 1.1.3. Analysis

Once the data is in a form that aids analysis, it is good opportunity to get more insight about the data and even confirm the organisations understanding of its data.

**Categorical Data**

Categorical data is normally non numeric data that can be divided into a fixed number of categories. It can give you insight into your population and demographics.

**Numeric Data**

Statistical analysis can be run on numeric data to look for summary values and outliers.

**Combination of both**

Statistical analysis can be run on numeric data to look for summary values and outliers.

### 1.1.4. Data Wrangling

Data wrangling is transforming data into a structure that is more useful for analysis.

Most data, resides in spreadsheets and relational databases and is represented as records and properties in a two dimensional sheet. Having one row for records and one column for properties and values. Data wrangling is changing the structure the data is in to get higher level or summary information from the data set.

As an example, we can look at sales data and break it down into categories.

And further smooth the data to reveal trends hiding behind sales.

**Look for cyclical trends in sales**

**Break down sales into its component parts.**

A detailed analysis of this step can be found here.

## 2. Forecasting Models

Once you have the data in the format that you want, you can run forecasting models on them.

Forecasting models can be categorized into deterministic vs stochastic. In a deterministic model, the outputs are based on the inputs. I.e. it is the outputs are “determined” by the inputs.

Stochastic models use an element of probability. The same inputs normally don’t yield exactly the same outputs. This is more in line with how the real world operates. There is always a random element to prediction.

### 2.1 Deterministic Models

Look at individual forecasting methods 2 reports and give a summary.

For a more detailed analysis, go here.

## 3. Stochastic Models

**2.2.1 Monte Carlo Forecasting**

Much has been written about Monte Carlo forecasting in the past. The name has a catchy ring to it.

It is a method of running random scenarios based on how data behaved in the past. All the scenarios are grouped up and then statistical analysis is run on them.

**A Random Walk**

Based on how sales are likely to change over the year, we can simulate what could happen over the forecast year. Each of the weekly sales will be drawn from a probability distribution like the **Distribution of Weekly Sales** above.

Numbers in the middle of the distribution (around the 10K mark) will be drawn more frequently than numbers on the sides. Plotted on a graph, it gives you a unique path every time a simulation is run.

For a more detailed analysis go here.

### Practical Applications

The normal distribution curves can be translated into practical applications.

```
It's highly likely that you sales will be between $530,931 and $678,475.
There's a 25% chance that sales will be less than 579,824.
```

For instance, there is a 25% chance that sales will be below a particular amount. If the chance of this happening is too high, then it needs to be looked at.

Monte Carlo simulations can be applied to any set of problems that deal with predictable randomness. It is a great tool to have in the CFOs toolkit because it promotes thinking along the lines of probability rather than absolutes.

## 4. Combination of the two

Combining deterministic and probabilistic forecasting can make use of the advantages in both methods of forecasting. You can run simulations but they can also follow a pattern.

The reports run a Monte Carlo simulation based on a simple trend of last years sales.

Although this is a good way of making predictions more accurate, the method does have some shortcomings.

- The hybrid method has more steps than both deterministic and probabilistic forecasting on their own.
- An even more complicated explanation to end users as it has more steps and more data to deal with.

If you have made to the end, thanks for reading. Please put any opinions and thoughts in the comments section, they always lead to interesting dialog.