Reconciliation of structured time series forecasts with graphs
27th June 2023 @ ISF 2023
Mitchell O’Hara-Wild, Monash University
Supervised by Rob Hyndman and George Athanasopolous
Reconciliation of structured time series forecasts with graphs
27th June 2023 @ ISF 2023
Mitchell O’Hara-Wild, Monash University
Supervised by Rob Hyndman and George Athanasopolous
How many forecasters will attend ISF 2024 and beyond?
Forecast \(\text{Attendees}_{T+h|T}\) with a suitable model and data.
How many attendees are from academia and industry?
Forecast \(\text{Academic}_{T+h|T}\) and \(\text{Industry}_{T+h|T}\) with
suitable models and data.
Something doesn’t add up here…
Independently produced forecasts are incoherent,
\(\text{Attendees}_{T+h|T} \neq \text{Academic}_{T+h|T} + \text{Industry}_{T+h|T}\).
Impose constraints to ensure coherency
Adjust the forecasts to satisfy the constraint
\(\text{Attendees}_{T+h|T} = \text{Academic}_{T+h|T} + \text{Industry}_{T+h|T}\).
Often we have many constraints, so matrices are used:
\[ \begin{bmatrix} \text{Attendees}_{t} \\ \text{Academic}_{t} \\ \text{Industry}_{t} \\ \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0\\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} \text{Academic}_{t} \\ \text{Industry}_{t} \\ \end{bmatrix} \]
or compactly, \(\mathbf{y}_t = \mathbf{S} \mathbf{b}_t\)
“Summing” or “structural” matrices are described in Hyndman et al. (2011).
Impose constraints to ensure coherency
These matrices are not easy to read, so we use graphs.
The weight of the edges corresponds to the \(\mathbf{S}\) matrix.
Reconciling forecasts
The L2 optimal way to adjust the forecasts to be coherent is MinT (Wickramasuriya, Athanasopoulos, and Hyndman 2018):
\[ \tilde{\mathbf{y}}_{T+h|T}=\mathbf{S}(\mathbf{S}'\mathbf{W}_{h}^{-1}\mathbf{S})^{-1}\mathbf{S}'\mathbf{W}_{h}^{-1}\hat{\mathbf{y}}_{T+h|T}. \]
where \(\mathbf{W}_{h}=\text{Var}[(\mathbf{y}_{t+h|t}-\hat{\mathbf{y}}_{t+h|t})]\)
There are many other reconciliation techniques and formulations that also work well.
Reconciling forecasts
Not only are coherent forecasts more reasonable, they are more accurate!
The large matrices can become complicated quickly when considering large collections of coherent time series.
Let’s instead consider graphs.
Each aggregate has a single constraint
The basic constraint shown before is ‘hierarchical’
Each aggregate has a single constraint
Hierarchical series often have multiple layers
In graph terms, this is known as a polytree.
There are many ways to disaggregate a series.
Consider where attendees have travelled from, domestic or international?
What about attendee origin in academia/industry?
Let’s consider the combinations.
Considering origin and workplace
Attendance can be disaggregated by both origin and workplace…
Considering origin and workplace
and then further disaggregated by the other.
A grouped structure has the same top and bottom series.
Considering origin and workplace
The grouped structure can be plotted in a single graph.
In graph terms, this is a directed acyclical graph (DAG).
A time series can be disaggregated by temporal granularity
Temporal reconciliation is described in Athanasopoulos et al. (2017).
What type of coherence structure is this?
This is a polytree, so this structure is hierarchical.
What type of coherence structure is this?
This structure has the same top and bottom series, so
temporal coherence is a grouped constraint.
Temporal coherence constraints are grouped can also be represented with directed acyclical graphs (DAGs).
Since both grouped and temporal coherence are DAGs, they can be combined into a single DAG.
A directed acyclical graph does not require a common top and bottom series.
DAGs can describe more general structures than grouped coherence.
Is it reasonable to leverage the full generality of DAGs?
Yes! Let’s see why.
What if the coherency structure had different bottom series?
This often occurs in these circumstances:
Example
Suppose Sales
is reported quarterly, but Profit
and Costs
twice yearly.
Example
This allows the higher frequency Sales
data to be used with the less frequent Profit
and Costs
data!
Example
Australian GDP is calculated with 3 approaches:
For simplicity consider a small part of these graphs. The complete graph structure has many more disaggregates.
This example is used in Athanasopoulos et al. (2020).
What does having different top series mean?
There must be multiple top series.
This often happens if the graph is disjoint.
This can occur for many reasons:
It makes no sense to aggregate folds of cross-validation.
A suitable DAG for cross-validated hierarchies is
Each disjoint graph can be reconciled separately.
DAGs are reasonable (and very useful!)
Graph coherence allows us to describe more general coherence structures, including
Coherence and graph theory
DAGs are a useful tool for representing structured time series and producing coherent forecasts.
Other benefits
Future work
This is a student presentation, please rate it!
Scan the QR (or go to the Whova app) and click on “Rate Session”
Thanks to these Unsplash contributors for their photos