Benchmarking forecast models#

# Required imports for workbook
import warnings

warnings.simplefilter("ignore", category=FutureWarning)

import numpy as np
import pandas as pd
from sklearn.metrics import (
    mean_absolute_error,
    mean_absolute_percentage_error,
    mean_squared_error,
)

import plotly.express as px
import plotly.graph_objects as go

from statsmodels.graphics.tsaplots import plot_acf

random_seed = 42

Once again, we will read in monthly A&E attendances from NHS England (following the same steps in the first worksheet).

# Import raw data
raw_data = pd.read_excel(
    "https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2024/02/Monthly-AE-Time-Series-January-2024.xls",
    skiprows=13,
    sheet_name="Activity",
)
# Make calender adjustments
raw_data["Period"] = pd.to_datetime(raw_data["Period"], format="%Y-%m-%d")
raw_data = raw_data[["Period", "Total Attendances"]].set_index("Period").asfreq("MS")
# Add columns showing number of days for given month
raw_data["days_per_month"] = raw_data.index.daysinmonth
# Add column for adjusted data
raw_data["total_attendances_adjusted"] = (
    raw_data["Total Attendances"] / raw_data["days_per_month"]
)
# We will again initially consider data pre-COVID.
ts_data = raw_data.loc[:"2020-02-01"]  # limit data to Feb 2020
# Plot time series of data using Plotly express line
ts_fig = px.line(
    ts_data,
    x=ts_data.index,
    y="total_attendances_adjusted",
    title="Monthly A&E attendances from NHS England",
)
ts_fig.show()

Simple forecasting methods (benchmarking)#

To organise our forecast data, we’ll create a new dataframe to store the output forecasts. We’ll treat pre-COVID as our training data, and post-COVID as the testing data.

forecast_horizon = 47
forecast_data = pd.DataFrame()
forecast_data.index = pd.date_range(
    start="2020-03-01", periods=forecast_horizon, freq="MS"
)
forecast_data.head()

Lets populate this pandas dataframe with some simple forecasts.

Naive forecast#

A simple forecast is simply to take the most recent time point and forecast it forward. This means all but 1 data point is ignored.

# Calculating Naive forecast

latest_value = ts_data.iloc[-1]["total_attendances_adjusted"]  # Get latest data point
forecast_data["naive"] = [latest_value] * forecast_horizon  # Add to forecast_data
forecast_data.head()
simple_fcsts_fig = go.Figure()
simple_fcsts_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["total_attendances_adjusted"], name="Historical data"
    )
)
simple_fcsts_fig.add_trace(
    go.Scatter(x=forecast_data.index, y=forecast_data["naive"], name="Naive forecast")
)
simple_fcsts_fig.update_layout(title="NHS England average daily attendances")

Seasonal naive forecast#

If we are working with data with clear seasonal pattern, we may want to use the most recent seasonal period. Seasonal naive uses the latest value from the same season (i.e., most recent December, or Monday, etc…). In this case, we are working with monthly data, therefore the latest value from the same season ould be the most recent months during the observational period.

def s_naive(ts, period, forecast_horizon):
    """
    Inputs:
        ts (pandas.Series): Historical time series observations (in order)
        period (int): Seasonal period (i.e., 12 for monthly data)
        forecast_horizon (int): Number of timesteps forecasted into the future
    Outputs:
        list: Forecasted time series
    """

    most_recent_seasonal = ts[-period:].to_list()

    # We now need to make the forecast
    # Number of times to multiply the list to ensure we meet forecast horizon
    mult_list = int(np.ceil(forecast_horizon / period))

    return (most_recent_seasonal * mult_list)[:forecast_horizon]
# Use custom function to calculate seasonal naive forecast

forecast_data["snaive"] = s_naive(
    ts_data["total_attendances_adjusted"], period=12, forecast_horizon=forecast_horizon
)
forecast_data.head()
simple_fcsts_fig = go.Figure()
simple_fcsts_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["total_attendances_adjusted"], name="Historical data"
    )
)
simple_fcsts_fig.add_trace(
    go.Scatter(
        x=forecast_data.index, y=forecast_data["snaive"], name="Seasonal naive forecast"
    )
)
simple_fcsts_fig.update_layout(title="NHS England average daily attendances")
simple_fcsts_fig.add_vrect(
    x0=ts_data.index[-12],
    x1=ts_data.index[-1],
    annotation_text="Latest seasonal period",
    annotation_position="top left",
    fillcolor="green",
    opacity=0.25,
    line_width=0,
)

Mean forecast#

We simply take the mean of historical data as our forecast.

forecast_data["mean"] = ts_data["total_attendances_adjusted"].mean()
simple_fcsts_fig = go.Figure()
simple_fcsts_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["total_attendances_adjusted"], name="Historical data"
    )
)
simple_fcsts_fig.add_trace(
    go.Scatter(x=forecast_data.index, y=forecast_data["mean"], name="Mean forecast")
)
simple_fcsts_fig.update_layout(title="NHS England average daily attendances")

There is a large difference between the naive and mean forecast. This is because the naive uses only the latest data point, and the mean consideres each data point equally.

Drift forecast#

The drift method is an extension of the Naive method that allows for increase/decrease over time. To calculate the drift, we simply look at the difference between the first and last observations.

def drift_method(ts, forecast_horizon):
    """
    Inputs:
        ts (pandas.Series): Historical time series observations (in order)
        forecast_horizon (int): Number of timesteps forecasted into the future
    Outputs:
        list: Forecasted time series
    """

    latest_obs = ts.iloc[-1]
    first_obs = ts.iloc[0]

    slope = (latest_obs - first_obs) / (len(ts) - 1)

    forecast_list = [latest_obs + slope * h for h in range(1, forecast_horizon + 1)]

    return forecast_list
forecast_data["drift"] = drift_method(
    ts_data["total_attendances_adjusted"], forecast_horizon
)
forecast_data.head()
simple_fcsts_fig = go.Figure()
simple_fcsts_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["total_attendances_adjusted"], name="Historical data"
    )
)
simple_fcsts_fig.add_trace(
    go.Scatter(x=forecast_data.index, y=forecast_data["drift"], name="Drift forecast")
)
simple_fcsts_fig.update_layout(title="NHS England average daily attendances")

Plot all benchmark forecasts along with the test data

simple_fcsts_fig = go.Figure()
simple_fcsts_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["total_attendances_adjusted"], name="Historical data"
    )
)
simple_fcsts_fig.add_trace(
    go.Scatter(x=forecast_data.index, y=forecast_data["naive"], name="Naive forecast")
)
simple_fcsts_fig.add_trace(
    go.Scatter(
        x=forecast_data.index, y=forecast_data["snaive"], name="Seasonal naive forecast"
    )
)
simple_fcsts_fig.add_trace(
    go.Scatter(x=forecast_data.index, y=forecast_data["mean"], name="Mean forecast")
)
simple_fcsts_fig.add_trace(
    go.Scatter(x=forecast_data.index, y=forecast_data["drift"], name="Drift forecast")
)
simple_fcsts_fig.add_trace(
    go.Scatter(
        x=raw_data.loc["2020-03-01":].index,
        y=raw_data.loc["2020-03-01":]["total_attendances_adjusted"],
        name="Post-COVID observations",
    )
)
simple_fcsts_fig.update_layout(title="NHS England average daily attendances")

Evaluating forecast accuracy#

We can quantify how well each model performs using simple accuracy metrics, such as mean absolute percentage error (MAPE).

MAPE has the advantage of being unit free (given as a percentage) and can therefore be compared across datasets.

Lets calculate MAPE manually to compare each forecast using data following 2022 (after the drop in attendances during COVID)

# Creating a function to calculate MAPE
def mape(observed, forecasted):
    """
    Inputs:
        observed (pandas.Series): Series of observed values
        forecasted (pandas.Series): Series of forecasted values
    Outputs:
        int: MAPE
    """

    absolute_error = abs(observed - forecasted)
    mape_t = np.mean((absolute_error / observed) * 100)

    return mape_t
# Loop through each forecast and print the MAPE score
for fcst in forecast_data.columns:

    mape_score = mape(
        raw_data.loc["2022-01-01":]["total_attendances_adjusted"],  # observations
        forecast_data.loc["2022-01-01":][fcst],  # forecasts
    )
    print(f"{fcst} had a MAPE score of {round(mape_score, 2)}%")

We could equally calculate this (and other accuracy metrics) by using sklearn (imported in the first cell).

for fcst in forecast_data.columns:

    mape_score = mean_absolute_percentage_error(
        raw_data.loc["2022-01-01":]["total_attendances_adjusted"],
        forecast_data.loc["2022-01-01":][fcst],
    )
    print(f"{fcst} had a MAPE score of {round(mape_score * 100, 2)}%")
Benchmarking models key points
  • Each forecast is simple to interpret and impliment:

    • The mean forecast is simply the mean of historical observations.

    • The naive forecast is the most recent value

    • The seasonal naive forecast is set to be the most recent value from the same season.

    • The drift method accounts for growth (in terms of the difference between the first and last observed values).

  • These simple forecasts can be used to benchmark against more advanced time series models. However, there could be cases where this is as good as it gets!

Task (20-30 minutes)

Lets again consider the daily A&E admissions dataset.

# Creating some fake time series data
np.random.seed(random_seed)

time_index = pd.date_range(start="2022-01-01", end="2024-05-01", freq="D")

seasonal_component = (10 * np.sin((2 * np.pi / 7) * np.arange(len(time_index)))).astype(
    int
)
remainder_component = np.random.normal(loc=0, scale=1.5, size=len(time_index)).astype(
    int
)
trend_component = np.round(np.arange(len(time_index)) / 35).astype(int)

time_series = 50 + seasonal_component + trend_component + remainder_component

# Format data in a pandas dataframe
task_data = pd.DataFrame({"date": time_index, "admissions_daily": time_series})
task_data.head()
training_data = task_data[: (-4 * 7)]
testing_data = task_data[(-4 * 7) :]
  1. Using any two from naive, mean, seasonal naive, or drift forecast on the training data, make predictions for the next 4 weeks (length of the test data).

# Answer here
  1. Plot the forecast against observed data during the testing period.

# Answer here
  1. Using the mean_absolute_percentage_error (or any of the accuracy metrics from sklearn.metrics) evaluate the accuracy of the forecasts.

# Answer here
  1. Plot the ACF for the residuals for the best performing model. Is there any ‘left over’ information in the error that has been missed by the model (i.e., seasonal variation)?

# Answer here