# LSTM-MSNet: Leveraging Forecasts on Sets of Related Time Series with Multiple Seasonal Patterns(Paper Summary and Implementation)

**Introduction**

Time series forecasting is enabling modern day business planning by landscaping the short-term, medium-term and long-term goals in an organisation. As such, generating accurate and reliable forecasts is, leading to savings and cost reductions.

The complex nature of the properties present in a time series, such as seasonality, trend, and level, may bring numerous challenges to produce accurate forecasts. In terms of seasonality, a time series may exhibit complex behaviour such as multiple seasonal patterns( such as daily ,weekly, yearly etc.), non-integer seasonality, calendar effects, etc.

The current methods to handle multiple seasonal patterns are mostly statistical forecasting techniques that are univariate. Thus, they treat each time series as an independent sequence of observations, and forecast it in isolation.

The univariate time series forecasting is not able to exploit any cross series information available in a set of time series that may be correlated and share a large amount of common feature.

Examples for these are sales demand of related product assortments in retail, server performance measures in computer centres, household smart meter data, etc. This can be applied to the time series shown in Fig. , in which these energy consumption patterns of various households can be similar and may share key properties in common.

- Generating forecasts for time series with multiple seasonal cycles is an important use-case for many industries nowadays.
- The current state of the art in this space are typically univariate methods, in which the model parameters of each time series are estimated independently. Consequently, these models are unable to include key patterns and structures that may be shared by a collection of time series.
- In contrast, LSTM-MSNet is a globally trained Long Short-Term Memory network (LSTM), where a single prediction model is built across all the available time series to exploit the knowledge in a group of related time series. Furthermore, this methodology combines a series of state-of-the-art multiseasonal decomposition techniques to supplement the LSTM learning procedure.

# Architecture of LSTM-MSNet

The architecture of LSTM-MSNet is a fusion of statistical decomposition techniques and recurrent neural networks. The LSTM-MSNet has three layers, namely:

- The pre-processing layer, which consists of a normalisation and variance stabilising phase, and a seasonal decomposition phase,
- The recurrent layer, which consists of an LSTM based stacking architecture to train the network, and
- A post-processing layer to de-normalize and re-seasonalize the time series to derive the final forecasts. The proposed framework can be used with any RNN variants(LSTM , GRU).

# Layer 1 : Normalisation and Variance Stabilisation Layer

As in a collection of time series, each time series may contain observations with different value ranges, performing a data normalisation strategy becomes necessary. Hence, we use the mean-scale transformation strategy, which uses the mean of a time series as the scaling

Here, xi,normalised represents the normalised observation, and k represents the number of observations of time series i

After normalising the time series, we stabilise the variance in the group of time series by transforming each time series to a logarithmic scale.

Apart from the variance stabilisation, the log transformation also enables the conversion of the seasonality form in a given time series to an additive form. This is a necessary requirement for additive time series decomposition techniques employed in decomposition layer.

Here, X denotes a time series, and Xi,logscaled is the corresponding log transformed time series i.

# Layer 2: Seasonal Decomposition

The complex nature of the properties present in a time series, such as seasonality, trend, and level, may bring numerous challenges to produce accurate forecasts. In terms of seasonality, a time series may exhibit complex behaviour such as multiple seasonal patterns, non-integer seasonality, calendar effects, etc.

When modelling seasonal time series with NNs, many studies suggest applying a prior seasonal adjustment, i.e., de-seasonalization to the time series.

Here, de-seasonalisation refers to the process of removing the seasonal component from a time series

- The main intention of this approach is to minimize the complexity of the original time series by detaching the multi-seasonal components from a time series, and thereby reducing the subsequent effort of the NN’s learning process.
- Here, Multi-seasonal components refer to the repeating patterns that exist in a time series and that may change slowly over time

## Proposed Methods for Decomposition

- Multiple STL Decomposition (MSTL)
- Seasonal-Trend decomposition by Regression (STR)
- Trigonometric, Box-Cox, ARMA, Trend, Seasonal (TBATS)
- Prophet
- Fourier Transformation

- Here we are using
**Prophet**among all for Decomposition

## Prophet

It is an automated forecasting framework developed by Taylor and Letham. The main aim of this framework is to address the challenges involved in forecasting at Facebook, the employer of those authors at that time. The challenges include the task of forecasting time series with multiple seasonal cycles.

- Prophet uses an
**additive decomposition layer**and also introduces an additional term to model holidays as seasonal covariates

Here, xt represents the observation at time t, and Sˆ t, Tˆ t, Rˆ t refers to the seasonal, trend, and the remainder components of the observation, n denotes the number of distinct seasonal patterns decomposed, Hˆ t denotes the holiday covariates in the model that represent the effects of holidays respectively.

# Layer 3 : Recurrent Layer

- The second layer, the Recurrent Layer, is the primary prediction module of LSTM-MSNet, equipped with LSTMs.
- In the LSTM, the gating mechanism together with the self contained memory cell enables the network to capture nonlinear long-term temporal dependencies in a sequence.
- The paper uses a configuration, also known as “ LSTM with peephole connection”. It configure the input and forget gates of the LSTM network to include the previous state of the memory cell (Ct−1).

## Moving Window Transformation :

- As a preprocessing step, they transform the past observations of time series (Xi) into multiple pairs of input and output frames using a Moving Window (MW) strategy. Later, these frames are used as the primary training source of LSTM-MSNet.
- In summary, the MW strategy converts a time series Xi of length K into (K − n − m) records, where each record has an amount of (m + n) observations.
- Here, m refers to the length of the output window (Forecast Period), and n is the length of the input window (Look back Period).
- These frames are generated according to the Multi-Input Multi-Output (MIMO) principle used in multi-step forecasting, which directly predicts all the future observations up to the intended forecasting horizon.
- The input window or the Look-back Period = n* output window or Forecast Period, with n being 1.5 in the paper. Example for understanding MIMO is:

# Training Paradigms

This study propose to use the output of the decomposition layer in two different ways. These paradigms can be distinguished by the time series components used in the Moving window process,

**Deseasonalised Approach (DS) :**

This approach uses seasonally adjusted time series as MW patches to train the LSTM-MSNet. Since the seasonal components are not included in DS for the training procedure, a reseasonalisation technique is later introduced in the Post-processing layer of LSTM-MSNet to ascertain the corresponding multiple seasonal components of the time series.

**Seasonal Exogenous Approach (SE):**

This second approach uses the output of the pre-processing layer, together with the seasonal components extracted from the multi seasonal decomposition as external variables. As the original components of the time series are used in the training phase of SE, the LSTM-MSNet is expected to forecast all the components of a time series, including the relevant multi-seasonal patterns. Therefore, a reseasonalisation stage is not required by SE

In summary,

- DS supplements the LSTM-MSNet by excluding the seasonal factors in the LSTM-MSNet training procedure. This essentially minimises the overall training complexity of the LSTM-MSNet.
- In contrast, SE supplements LSTM-MSNet in the form of exogenous variables that assist modelling the seasonal trajectories of a time series

# Deseasonalised Approach (DS)

- The DS paradigm can be seen as a boosting ensemble technique , where the deseasonalisation process is a weak base learner that is subsequently supplemented by the LSTM, which is trained on the remainder of the base learner.
- Here, the complexity of the base learner, i.e., the different deseasonalisation techniques, can affect the subsequent LSTM training procedure and may lead to different results.

## LSTM Learning Scheme

As highlighted earlier, we use the past observations of time series Xi , in the form of input and output windows( generated using Moving Window strategy) to train the LSTM-MSNet.

Fig. illustrates the primary LSTM learning architecture of LSTM-MSNet. This consists of four components, namely: Training input window layer, LSTM stacking layer, Dense layer and Training output window layer.

## Local Normalization

Before feeding these windows to the network for training, each input and output window is subjected to a local normalisation process to avoid possible network saturation effects caused by the bounds of the network activation functions.

- In the DS approach, we use the trend component of the last value of the input window as a local normalisation factor.
- These factors are subtracted from each data point in the corresponding input and output window

## Loss Function

- The L1-norm, as the primary learning objective function, which essentially minimises the absolute differences between the target values and the estimated values.
- The L1-loss is given by:

Here, Yt ∈ R m refers to the actual observations of values in the output window at time step t. The cell output of the LSTM at time step t is defined by Yˆ t. Also, ΩT rain is the set of time steps used for training. We include an L2-regularisation term to minimise possible over-fitting of the network.

# Post-Processing Layer

- The reseasonalisation and renormalisation is the main component of the post processing layer in LSTM-MSNet.
- In the reseasonalisation stage, the relevant seasonal components of the time series are added to the forecasts generated by the LSTM. This is computed by repeating the last seasonal components of the time series to the intended forecast horizon.
- Next, in the renormalisation phase, the generated forecasts are back-transformed to their original scale by adding back the corresponding local normalisation factor, and taking the exponent of the values.
- The final forecasts are obtained by multiplying this vector by the scaling factor used for the normalisation process.

# Error Metrics

- Symmetric Mean Absolute Percentage Error (sMAPE)
- The Mean Absolute Scaled Error (MASE).
- The sMAPE and MASE are defined as follows:

Here, Yt represents the observation at time t, and Ft is the generated forecast. Also, m denotes the number of data points in the test set and n is the number of observations in the training set of a time series. For M4 dataset s is 168.

# Conclusion

This paper have presented the LSTM-MSNet methodology, a novel, three-layered forecasting framework that is capable of forecasting a group of related time series with multiple seasonal cycles. This methodology is based on time series decomposition and LSTM recurrent neural networks, to overcome the limitations of the current univariate state-of the-art models by training a unified model that exploits key structures, behaviours, and patterns common within a group of time series.

As a possible future work, a hybrid version of this approach can be introduced to handle seasonalities in longer time series. Here, the deseasonalised approach can be used to model shorter seasonalities, whereas the seasonal exogenous approach can be applied to address the longer seasonalities.