Exponential Smoothing#

# Required imports for workbook
import warnings

warnings.simplefilter("ignore", category=FutureWarning)

import numpy as np
import pandas as pd
from sklearn.metrics import (
    mean_absolute_error,
    mean_absolute_percentage_error,
    mean_squared_error,
)

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt, ETSModel

random_seed = 42
# Benchmarking forecast models


def s_naive(ts, period, forecast_horizon):
    """
    Inputs:
        ts (pandas.Series): Historical time series observations (in order)
        period (int): Seasonal period (i.e., 12 for monthly data)
        forecast_horizon (int): Number of timesteps forecasted into the future
    Outputs:
        list: Forecasted time series
    """

    most_recent_seasonal = ts[-period:].to_list()

    # We now need to make the forecast
    # Number of times to multiply the list to ensure we meet forecast horizon
    mult_list = int(np.ceil(forecast_horizon / period))

    return (most_recent_seasonal * mult_list)[:forecast_horizon]


def drift_method(ts, forecast_horizon):
    """
    Inputs:
        ts (pandas.Series): Historical time series observations (in order)
        forecast_horizon (int): Number of timesteps forecasted into the future
    Outputs:
        list: Forecasted time series
    """

    latest_obs = ts.iloc[-1]
    first_obs = ts.iloc[0]

    slope = (latest_obs - first_obs) / (len(ts) - 1)

    forecast_list = [latest_obs + slope * h for h in range(1, forecast_horizon + 1)]

    return forecast_list

Once again, we will read in monthly A&E attendances from NHS England (following the same steps as the first two worksheets).

# Import raw data
raw_data = pd.read_excel(
    "https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2024/02/Monthly-AE-Time-Series-January-2024.xls",
    skiprows=13,
    sheet_name="Activity",
)
# Make calender adjustments
raw_data["Period"] = pd.to_datetime(raw_data["Period"], format="%Y-%m-%d")
raw_data = raw_data[["Period", "Total Attendances"]].set_index("Period").asfreq("MS")
# Add columns showing number of days for given month
raw_data["days_per_month"] = raw_data.index.daysinmonth
# Add column for adjusted data
raw_data["avg_daily_attendances"] = (
    raw_data["Total Attendances"] / raw_data["days_per_month"]
)
# We will again initially consider data pre-COVID.
ts_data = raw_data.loc[:"2020-02-01"]  # limit data to Feb 2020
post_covid = raw_data.loc["2020-02-01":]
# Plot time series of data using Plotly express line
ts_fig = px.line(
    ts_data,
    x=ts_data.index,
    y="avg_daily_attendances",
    title="Monthly A&E attendances from NHS England",
)
ts_fig.show()

Exponential smoothing#

Exponential smoothing (along with ARIMA) is one of the most widely used techniques in time series forecasting.

The exponential smoothing methods use weighted averages of past observations (where weights decay exponentially for older values). In other words, the forecast is more influenced by recent values, compared with older observations.

Simple Exponential Smoothing (SES)#

\(\hat{y}_{T + 1 | T} = \alpha y_{T} + \alpha(1-\alpha) y_{T-1} + \alpha(1-\alpha)^2 y_{T-2} + ...\), where \(0\leq \alpha \leq1\) is the smoothing parameter.

Note, \(\hat{y}_{T + 1 | T}\) is our 1-step forecast based on data \(y_{1} + y_{2} + ... + y_{T}\)

Exponentially decaying weights#

# How much weight do we want to give to the most recent observation? This is what alpha controls.

alpha = 0.1  # (alpha close to 1 ~ naive)
weight_list = []

for step in range(len(ts_data)):
    weighting = alpha * ((1 - alpha) ** step)
    weight_list.append(weighting)
# Show first 5 weights
weight_list[:5]
# Plot decaying weights
weighting_fig = px.bar(y=weight_list, title=f"Weights decay with alpha = {alpha}")
weighting_fig.update_layout(xaxis_title="Timestep", yaxis_title="Weight")
weighting_fig.show()

Lets create a dataframe of the attendances and the corresponding weight

SES_data = ts_data[["avg_daily_attendances"]].copy()
SES_data["weighting"] = weight_list[::-1]
SES_data.tail()

To make our forecast, we would simply multiply each weight with the corresponding observation (starting with the highest weight for the most recent observation) and sum the values.

# Calculate forecast
ses_forecast = (SES_data["avg_daily_attendances"] * SES_data["weighting"]).sum()
print(ses_forecast)
fcst_fig = go.Figure()
fcst_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["avg_daily_attendances"], name="Historical data"
    )
)
fcst_fig.add_trace(
    go.Scatter(
        x=post_covid.index, y=[ses_forecast] * len(post_covid), name="SES forecast"
    )
)
fcst_fig.update_layout(
    yaxis_title="avg_daily_attendances",
    title="Simple exponential smoothing (weights approach)",
)

This method of SES can be flawed in certain circumstances. Remember, when calculating the mean, each weight has a value \(\dfrac{1}{len(data)}\). We don’t adjust the weighting to ensure it sums to 1 in SES, so need to check this!

# Check weights sum to 1
np.sum(weight_list)

SES using an iterative approach#

\(\hat{y}_{T + 1 | T} = \alpha y_{T}+(1-\alpha)\hat{y}_{T | T - 1}\)
\(\hat{y}_{1} = y_{1}\)

Lets run through an example of using this iterative approach.

SES_data = ts_data[["avg_daily_attendances"]].copy()
SES_data["Forecast"] = np.nan
SES_data.head()

\(\hat{y}_{1} = y_{1}\) so we will use the first observation for our first forecast.

SES_data.loc[pd.to_datetime("2010-08-01"), ["Forecast"]] = SES_data[
    "avg_daily_attendances"
].iloc[0]
SES_data.head()

Now lets calculate \(\hat{y}_{2 | 1}\):

SES_data.loc[pd.to_datetime("2010-09-01"), ["Forecast"]] = (
    alpha * SES_data["avg_daily_attendances"].iloc[1]
    + (1 - alpha) * SES_data["Forecast"].iloc[0]
)
SES_data.head()

Next, forecast \(\hat{y}_{3 | 2}\):

SES_data.loc[pd.to_datetime("2010-10-01"), ["Forecast"]] = (
    alpha * SES_data["avg_daily_attendances"].iloc[2]
    + (1 - alpha) * SES_data["Forecast"].iloc[1]
)
SES_data.head()

This would take a lot of cells to run through the entire dataset…its probably more efficient to do this in a loop!

# given a series and alpha, return series of smoothed points
def SES_iterator(obs_series, alpha):
    """
    Inputs:
        obs_series (pandas.Series): Observational data.
        alpha (float): 0<=alpha<=1, the smoothing parameter.
    Output:
        list: Fitted values (one step forecasts in training data)
    """

    # First fitted value (we'll use our first observation for this)
    ses_fit = [obs_series.iloc[0]]

    # Loop through time series
    for step in range(1, len(obs_series)):

        # Make a 1-step forecast
        one_step_fcst = obs_series.iloc[step] * alpha + (1 - alpha) * ses_fit[step - 1]
        # Add forecast to list
        ses_fit.append(one_step_fcst)

    return ses_fit
# Run iterative SES model
ses_list = SES_iterator(SES_data["avg_daily_attendances"], alpha=0.1)
# check first three are the same as our manual calculation
ses_list[:3]
# The forecast will simply be the last value (flat forecast)
ses_list[-1]

This is pretty much identical to our previous approach:

# Print forecast when using weights approach
print(ses_forecast)

Note, with large T, this will converge even closer to our original forecast.

Lets plot the fitted model and the forecast from our iterative approach.

fcst_fig = go.Figure()
fcst_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["avg_daily_attendances"], name="Historical data"
    )
)
fcst_fig.add_trace(
    go.Scatter(
        x=SES_data.index, y=ses_list, name="Fitted SES model (iterative approach)"
    )
)

fcst_fig.add_trace(
    go.Scatter(
        x=post_covid.index,
        y=[ses_list[-1]] * len(post_covid),
        name="SES forecast (latest fitted value)",
    )
)

fcst_fig.update_layout(
    title="Simple exponential smoothing (iterative approach)",
    yaxis_title="avg_daily_attendances",
)

This is quite a lot of work for a simple forecast. Luckily, there is a much more efficient way of doing this… statsmodels!

SES using statsmodels#

# Fit our attendance data to SimpleExpSmoothing from statsmodels
ses_model = SimpleExpSmoothing(ts_data["avg_daily_attendances"]).fit(
    smoothing_level=0.1
)

Lets calculate a foreast from our fitted model.

# We supply the start and end date for our forecast (which we've done for post COVID period here)
ses_series = ses_model.predict(start=post_covid.index[1], end=post_covid.index[-1])
ses_series.head()
fcst_fig = go.Figure()
fcst_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["avg_daily_attendances"], name="Historical data"
    )
)
fcst_fig.add_trace(
    go.Scatter(x=ses_series.index, y=ses_series, name="SES forecast (statsmodels)")
)
fcst_fig.update_layout(
    title="Simple exponential smoothing (statsforecast)",
    yaxis_title="avg_daily_attendances",
)
Simple Exponential Smoothing key points
  • Simple and intuitive algorithm that uses exponentially decaying weights to form an average (which you control with \(\alpha\)).

  • The forecasts will be flat.

  • Useful forecast model if your data has no trend or seasonal patterns

Task 1 (15-20 minutes)

Again, we will look at the daily A&E admissions dataset.

# Creating some fake time series data
np.random.seed(random_seed)

time_index = pd.date_range(start="2022-01-01", end="2024-05-01", freq="D")

seasonal_component = (10 * np.sin((2 * np.pi / 7) * np.arange(len(time_index)))).astype(
    int
)
remainder_component = np.random.normal(loc=0, scale=1.5, size=len(time_index)).astype(
    int
)
trend_component = np.round(np.arange(len(time_index)) / 35).astype(int)

time_series = 50 + seasonal_component + trend_component + remainder_component

# Format data in a pandas dataframe
task_data = pd.DataFrame({"date": time_index, "admissions_daily": time_series})
task_data.head()
training_data = task_data[: (-4 * 7)]
testing_data = task_data[(-4 * 7) :]
  1. Fit and forecast a simple exponential smoothing model using statsmodels on the daily A&E training_data dataset (set smoothing_level to 0.2). Look at the statsmodels documentation if you’re unsure on anything, available here. Create a new column in testing_data to store the forecast.

# Answer here
  1. How does the forecast look? Use Plotly (or another visualisation library of your choice) to plot forecasts against the test dataset.

# Answer here
  1. How does changing alpha imapct output forecasts? Add 3 additional forecasts with smoothing_level values of 0.01, 0.5, 0.9 (store each as a new column in the testing_data) to the forecast figure, to compare the outputs.

# Answer here

Double Exponential Smoothing (Holt’s linear trend)#

The previous forecast gave a flat forecast for the weighted average. If our time series if showing a clear trend, how can we include a trend in the forecast?

Holt’s linear trend is an extension of SES, where we now have two smoothing equations. One for the level (\(\alpha\)) and one for the trend (\(\beta\)).

Forecast equation: \(\hat{y}_{t + h | t} = l_{t} + hb_{t}\)

Level equation: \(l_{t} = \alpha y_{t} + (1 - \alpha)(l_{t-1} + b_{t-1})\)
Trend equation: \(b_{t} = \beta(l_{t} - l_{t-1}) + (1 - \beta)b_{t-1}\)

# Calculating Holt's method from scratch (feel free to ignore this)


def holts_linear_method(time_series, alpha, beta):

    level_component = []
    trend_component = []
    fitted_model = []

    for t, y in enumerate(time_series):

        if t == 0:
            # **We ned to first estimate our starting values!**
            # Estimate first level.
            level = y
            # Estimate first trend.
            trend = time_series[1] - y

        else:
            # Now we can start the iteration process!
            level = alpha * y + (1 - alpha) * (
                level_component[-1] + trend_component[-1]
            )
            trend = (
                beta * (level - level_component[-1]) + (1 - beta) * trend_component[-1]
            )

        # Combine components to create a 1-step forecast
        forecast = level + trend

        level_component.append(level)
        trend_component.append(trend)
        fitted_model.append(forecast)

    return level_component, trend_component, fitted_model

Extract manual calculation of Holts method components.

manual_level_component, manual_trend_component, manual_forecast = holts_linear_method(
    ts_data["avg_daily_attendances"], 0.1, 0.3
)

Using statsmodels certaintly requires less code!

holt_model = Holt(endog=ts_data["avg_daily_attendances"]).fit(
    smoothing_level=0.1, smoothing_trend=0.3
)

What do the trend and level componants look like?

fig = make_subplots(
    rows=1, cols=2, subplot_titles=("Level component", "Trend component")
)

fig.add_trace(
    go.Scatter(
        x=holt_model.level.index, y=holt_model.level.values, name="Level (statsmodels)"
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=holt_model.trend.index, y=holt_model.trend.values, name="Trend (statsmodels)"
    ),
    row=1,
    col=2,
)

fig.add_trace(
    go.Scatter(
        x=holt_model.level.index,
        y=manual_level_component,
        name="Level (manual calculation)",
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=holt_model.trend.index,
        y=manual_trend_component,
        name="Trend (manual calculation)",
    ),
    row=1,
    col=2,
)

fig.update_layout(
    height=350, width=900, title_text="Componants of Holt's Linear Trend Method"
)
fig.show()

statsforecast is probably a better option than calculating each component from scratch… lets make forecasts from our holt_model.

holt_series = holt_model.predict(start=post_covid.index[1], end=post_covid.index[-1])
holt_series.head()

Lets visualise Holt’s linear trend forecast

holt_fcst_fig = px.line(x=holt_series.index, y=holt_series)
holt_fcst_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["avg_daily_attendances"], name="Historical data"
    )
)
holt_fcst_fig.update_layout(
    yaxis_title="avg_daily_attendances", xaxis_title=None, title="Holt's linear method"
)
holt_fcst_fig.show()

Damped trend#

A linear trend may not always be the best forecast (it may be unreasonable to expect constant linear growth). We can adjust our model to include a damped trend simply by passing damped_trend=True. We won’t go into detail on this method, however, we introduce a new damping parameter, \(\phi\), where \(0 \leq \phi \leq 1\) (see fpp for more info).

holt_model_damped = Holt(endog=ts_data["avg_daily_attendances"], damped_trend=True).fit(
    smoothing_level=0.1, smoothing_trend=0.3
)
# Predict attendances during post-COVID period
holts_damped = holt_model_damped.predict(
    start=post_covid.index[1], end=post_covid.index[-1]
)
holts_damped.head()
holt_fcst_fig.add_trace(
    go.Scatter(x=holts_damped.index, y=holts_damped, name="Damped trend")
)
Double Exponential Smoothing key points
  • Extension of simple exponential smoothing

  • Suitable if you time series has clear trend, but no seasonality.

  • Introduces a new smoothing parameter (\(\beta\)) which allows you to adjust the weighting for the trend component.

  • Can also use a damped trend if linear trend is not expected (may be useful for long forecast horizons)

Task 2 (10-15 mins)

  1. Use Holt to make a linear trend forecast trained on the training_data dataset (use smoothing_level= 0.01 & smoothing_trend= 0.01 ).

# Answer here
  1. Plot the linear trend forecast alongside the test dataset.

# Answer here

Triple Exponential Smoothing (Holt-Winters seasonal method)#

Finally, we consider a model that can incoorperate both seasonal and trend components. Holt-Winters extendes Holt’s method, by adding a third smoothing equation for seasonality.

We therefore have three smoothing parameters, \(\alpha\) (for level), \(\beta\) (for trend), and \(\gamma\) (for seasonality).

Forecast equation: \(\hat{y}_{t + h | t} = l_{t} + hb_{t} + s_{t+h-m(k+1)}\), where k is the integer part from \(\frac{h-1}{m}\).

Level equation: \(l_{t} = \alpha (y_{t} - s_{t-m}) + (1 - \alpha)(l_{t-1} + b_{t-1})\)
Trend equation: \(b_{t} = \beta(l_{t} - l_{t-1}) + (1 - \beta)b_{t-1}\)
Seaonal equation: \(s_{t} = \gamma(y_{t} - l_{t-1} - b_{t-1}) + (1 - \gamma)s_{t-m}\)

We can also have a model for multiplicative seasonality, but we will just consider the additive model here.

The code gets a bit messier for triple exponential smoothing, so we won’t derive each component from scratch, as we did for previous models. This would, however, be a great exercise to understand the process in more depth if its something that interests you!

Lets use statsmodels to fit a Holt Winters model to our attendance data.

hw_model = ExponentialSmoothing(
    endog=ts_data["avg_daily_attendances"],
    seasonal_periods=12,
    trend="add",
    seasonal="add",
).fit(smoothing_level=0.6, smoothing_trend=0.1, smoothing_seasonal=0.05)

Lets plot each component

fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=("Level componant", "Trend componant", "Seasonal componant"),
)

fig.add_trace(
    go.Scatter(x=hw_model.level.index, y=hw_model.level.values, name="Level"),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(x=hw_model.trend.index, y=hw_model.trend.values, name="Trend"),
    row=1,
    col=2,
)

fig.add_trace(
    go.Scatter(x=hw_model.season.index, y=hw_model.season.values, name="Season"),
    row=1,
    col=3,
)

fig.update_layout(height=350, width=900, title_text="Componants of Holt-Winters model")
fig.show()
# Predict attendances during post-COVID period
hw_series = hw_model.predict(start=post_covid.index[1], end=post_covid.index[-1])
hw_series.head()
# Visualise Holt-Winters forecast
hw_series_fig = px.line(x=hw_series.index, y=hw_series)
hw_series_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["avg_daily_attendances"], name="Historical data"
    )
)
hw_series_fig.update_layout(
    yaxis_title="avg_daily_attendances", xaxis_title=None, title="Holt-Winters method"
)
hw_series_fig.show()

Our forecasts are starting to look good!

One major challange is estimating optimal values for \(\alpha\), \(\beta\),and \(\gamma\). Visualising the forecast or trial and error probably isnt the best approach to find the optimal values…

There are a number of techniques we could use (such as cross-validation and train-test splits), but luckily there is a framework that can automate this selection process for us!

The ExponenTialSmoothing (ETS) framework#

The ExponenTialSmoothing (ETS) framework provides automated selection of parameters (\(\alpha\), \(\beta\),and \(\gamma\)) and also allows you to extract probabalistic information from the model (i.e., a distribution of likely outcomes). Rather than providing a point forecast as we’ve previously done ETS returns prediction intervals!

It’s vital to use prediciton intervals, as there is no other way to show the level of certainty in a single point-forecast.

Fitting our data using ETSModel.

ets_model = ETSModel(
    endog=ts_data["avg_daily_attendances"],
    # All we need to do is specify the seasonal period,
    # And whether the trend/seasonal components are additive or multiplicative
    seasonal_periods=12,
    trend="add",
    seasonal="add",
).fit()
# Predict from trained model
ets_series = ets_model.predict(start=post_covid.index[1], end=post_covid.index[-1])
ets_series.head()
hw_series_fig = px.line(x=ets_series.index, y=ets_series)
hw_series_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["avg_daily_attendances"], name="Historical data"
    )
)
hw_series_fig.update_layout(
    yaxis_title="avg_daily_attendances", xaxis_title=None, title="ETS modelling"
)
hw_series_fig.show()

We can see the parameters the model has selected

ets_model.summary()

Prediction uncertainty#

One way to extract uncertainty is to create multiple simulated forecasts.

sims = ets_model.simulate(nsimulations=47, anchor="end", repetitions=25)
sims.head()
# Lets visualise all simulations
hw_series_fig = px.line(x=ets_series.index, y=ets_series)

hw_series_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["avg_daily_attendances"], name="Historical data"
    )
)

for sim_num, col in enumerate(sims.columns):
    hw_series_fig.add_trace(
        go.Scatter(
            x=sims.index,
            y=sims[col].values,
            line=dict(color="rgba(105, 105, 105, 0.1)"),
            name=f"sim_num ={sim_num}",
        )
    )

hw_series_fig.update_layout(
    yaxis_title="avg_daily_attendances",
    xaxis_title=None,
    title="ETS modelling (using simulation)",
)
hw_series_fig.show()

We could take quantiles of these simulations (i.e., 2.5th and 97.5th for 95% prediction intervals) or get them directly from statsmodels.

pred_df = ets_model.get_prediction(
    start="2020-03-01", end=raw_data.index[-1]
).summary_frame(alpha=0.025)
pred_df.head()
hw_series_fig = px.line(x=pred_df.index, y=pred_df["mean"])

hw_series_fig.add_trace(
    go.Scatter(
        x=ts_data.index, y=ts_data["avg_daily_attendances"], name="Historical data"
    )
)


hw_series_fig.add_trace(
    go.Scatter(
        x=pred_df.index,
        y=pred_df["pi_lower"].values,
        line=dict(color="rgba(105, 105, 105, 0.5)"),
        name="yhat_lower",
    )
)
hw_series_fig.add_trace(
    go.Scatter(
        x=pred_df.index,
        y=pred_df["pi_upper"].values,
        line=dict(color="rgba(105, 105, 105, 0.5)"),
        name="yhat_upper",
    )
)

hw_series_fig.update_layout(
    yaxis_title="avg_daily_attendances",
    xaxis_title=None,
    title="ETS modelling (prediction intervals)",
)
hw_series_fig.show()

Task 3 (20-30 mins)

  1. Use statsmodels ETSModel to train a ETS model using training_data and automatically estimate the best exponential smoothing model (set trend and seasonal both to 'add').

# Answer here
  1. Obtain the prediction, along with 95% prediction intervals, using the get_prediction() method, and plot the results. Use the code above if you’re unsure on this, or read the documentation.

# Answer here
  1. Create a dataframe that stores the testing data, the ETS forecast and any two benchmarking forecasts (all trained on the training data). How does the performance of ETS compare with the benchmarking models? Use MAPE (or another metric of your choice) to summarise model performance.

# Answer here