寓言能否调和层级时间序列，其中层级在不同节点的深度不同？

如何解决寓言能否调和层级时间序列，其中层级在不同节点的深度不同？

我正在使用寓言包来获取关于一组分层时间序列的预测。我想指定一个层次结构，该层次结构在所有节点上的深度都不相同。

实际示例：

时间序列B1和B2相加到时间序列M1。
时间序列M1和M2相加为时间序列T，该时间序列位于层次结构的顶部。
时间序列M2不是一组时间序列的总和；这是它自己的时间序列。

创建tsibble格式的小型随机数据集：


library(dplyr)
library(tsibble)
library(fable)

set.seed(1)
B1 <- rnorm(12,mean = 5) + (1:12)
B2 <- rnorm(12,mean = 5)
M2 <- rnorm(12,mean = 25)

ts_data <- tibble(value = c(B1,B2,M2),month = rep(yearmonth(paste("2020",1:12,sep="-")),3),B = c(rep("B1",12),rep("B2",rep("B3",12)),M = c(rep("M1",24),rep("M2",12))) %>%
  as_tsibble(key = c("B","M"),index = month)

在3个时间序列中分别估算单独的ARIMA模型，进行汇总和预测：

fcsts <- ts_data %>%
  # Specify hierarchy
  aggregate_key(M / B,value = sum(value)) %>%
  # Fit models
  model(arima = ARIMA(value)) %>%
  # Set up reconciliation
  mutate(mint = min_trace(arima)) %>%
  # Produce the forecasts
  forecast(h = 1)

我担心结果可能不正确的原因是，我可以创建一个病理示例，即使没有实际的汇总，对帐也会产生较小的置信区间：

病理示例：

时间序列B3是M2的子代，它是T的子代。

我通过对上一个子集进行设置来为此示例创建一个数据集：

ts_data_2 <- ts_data %>% 
  filter(B == "B3")

再次估算单独的ARIMA模型，进行汇总和预测：

fcsts_2 <- ts_data_2 %>%
  # Specify hierarchy
  aggregate_key(M / B,value = sum(value)) %>%
  # Fit models
  model(arima = ARIMA(value)) %>%
  # Set up reconciliation
  mutate(mint = min_trace(arima)) %>%
  # Produce the forecasts
  forecast(h = 6)

结果如下：

> fcsts_2
# A fable: 6 x 6 [1M]
# Key:     M,B,.model [6]
  M            B            .model    month value .distribution
  <chr>        <chr>        <chr>     <mth> <dbl> <dist>       
1 M2           B3           arima  2021 Jan  24.9 N(25,0.63)  
2 M2           <aggregated> arima  2021 Jan  24.9 N(25,0.63)  
3 <aggregated> <aggregated> arima  2021 Jan  24.9 N(25,0.63)  
4 M2           B3           mint   2021 Jan  24.9 N(25,0.21)  
5 M2           <aggregated> mint   2021 Jan  24.9 N(25,0.21)  
6 <aggregated> <aggregated> mint   2021 Jan  24.9 N(25,0.21)

即使没有实际的聚合，方差也从原始ARIMA模型中的0.63降低到0.21。当然，这是一个示例，其中根本不应该使用对帐，但是这里方差减小的事实使我担心，在实际示例中对帐无法正常工作。

有没有一种方法可以在实际示例中指定模型，以避免从B3到M2的聚合？（我尝试使用NA代替B列中的级别“ B3”，但这不起作用。）

解决方法

这里的问题是，对帐基于错误的协方差矩阵的估计，默认设置使用对角矩阵，当退化的层次结构存在时，对角矩阵会导致严重低估。

使用mint_shrink选项时，会得到更好的结果（尽管仍然有一些偏差）：

library(dplyr)
library(tsibble)
library(fable)

set.seed(1)
B1 <- rnorm(12,mean = 5) + (1:12) 
B2 <- rnorm(12,mean = 5) 
M2 <- rnorm(12,mean = 25) 

ts_data <- tibble(value = c(B1,B2,M2),month = rep(yearmonth(paste("2020",1:12,sep="-")),3),B = c(rep("B1",12),rep("B2",rep("B3",12)),M = c(rep("M1",24),rep("M2",12))) %>% 
  as_tsibble(key = c("B","M"),index = month) 
ts_data_2 <- ts_data %>%  
  filter(B == "B3") 
ts_data_2 %>% 
  # Specify hierarchy 
  aggregate_key(M / B,value = sum(value)) %>% 
  # Fit models 
  model(arima = ARIMA(value)) %>% 
  # Set up reconciliation 
  mutate(mint = min_trace(arima,method="mint_shrink")) %>% 
  # Produce the forecasts 
  forecast(h = 6)
#> # A fable: 36 x 6 [1M]
#> # Key:     M,B,.model [6]
#>    M     B     .model    month       value .mean
#>    <chr> <chr> <chr>     <mth>      <dist> <dbl>
#>  1 M2    B3    arima  2021 Jan N(25,0.63)  24.9
#>  2 M2    B3    arima  2021 Feb N(25,0.63)  24.9
#>  3 M2    B3    arima  2021 Mar N(25,0.63)  24.9
#>  4 M2    B3    arima  2021 Apr N(25,0.63)  24.9
#>  5 M2    B3    arima  2021 May N(25,0.63)  24.9
#>  6 M2    B3    arima  2021 Jun N(25,0.63)  24.9
#>  7 M2    B3    mint   2021 Jan N(25,0.56)  24.9
#>  8 M2    B3    mint   2021 Feb N(25,0.56)  24.9
#>  9 M2    B3    mint   2021 Mar N(25,0.56)  24.9
#> 10 M2    B3    mint   2021 Apr N(25,0.56)  24.9
#> # … with 26 more rows

^{由reprex package（v0.3.0）于2020-09-23创建}

通过寓言，寓言允许您删除不需要的聚合/非分类。要删除B3级别，您可以使用：

library(fpp3)
B1 <- rnorm(12,mean = 5) + (1:12)
B2 <- rnorm(12,mean = 5)
M2 <- rnorm(12,mean = 25)

ts_data <- tibble(value = c(B1,12))) %>%
  as_tsibble(key = c("B",index = month)

ts_data %>%
  # Specify hierarchy
  aggregate_key(M / B,value = sum(value)) %>% 
  # Remove redundant nodes from hierarchy (#226)
  filter(!(B == "B3")) %>% 
  # Fit models
  model(arima = ARIMA(value)) %>%
  # Set up reconciliation
  mutate(mint = min_trace(arima)) %>%
  # Produce the forecasts
  forecast(h = 1)
#> # A fable: 10 x 6 [1M]
#> # Key:     M,.model [10]
#>    M            B            .model    month        value .mean
#>    <chr>        <chr>        <chr>     <mth>       <dist> <dbl>
#>  1 M1           B1           arima  2021 Jan   N(19,1.9) 18.5 
#>  2 M1           B1           mint   2021 Jan   N(18,1.2) 17.6 
#>  3 M1           B2           arima  2021 Jan    N(4.9,1)  4.86
#>  4 M1           B2           mint   2021 Jan N(4.3,0.81)  4.26
#>  5 M1           <aggregated> arima  2021 Jan   N(20,3.6) 20.4 
#>  6 M1           <aggregated> mint   2021 Jan   N(22,1.2) 21.9 
#>  7 M2           <aggregated> arima  2021 Jan  N(25,0.62) 24.8 
#>  8 M2           <aggregated> mint   2021 Jan  N(25,0.56) 24.7 
#>  9 <aggregated> <aggregated> arima  2021 Jan     N(46,4) 45.8 
#> 10 <aggregated> <aggregated> mint   2021 Jan   N(47,1.4) 46.6

^{由reprex package（v0.3.0）于2020-09-23创建}

您尝试使用NA代替B3实际上接近于我在下一版本中添加的内容，以更好地支持不平衡的层次结构。我将导出agg_vec()，以允许您创建自定义的<aggregated>值，然后可以将其传递到aggregate_key()中。我还希望为aggregate_key()添加一个功能，以检测冗余节点，并可能将其删除（https://github.com/tidyverts/fabletools/issues/226）。