让我们开始加载fpp3软件包(https://github.com/robjhyndman/fpp3-package)和美国消费支出数据集(https://rdrr.io/cran/fpp3/man/us_change.html)
library(fpp3)
us_change <- readr::read_csv("https://otexts.com/fpp3/extrafiles/us_change.csv") %>%
mutate(Time = yearquarter(Time)) %>%
as_tsibble(index = Time)
假设我们要根据收入的变化预测消费的变化,因此我们将收入作为预测指标。
让我们从简单的线性回归模型
开始fit_lm <- us_change %>% model(TSLM(Consumption ~ Income))
模型报告:
> report(fit_lm)
Series: Consumption
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-2.40845 -0.31816 0.02558 0.29978 1.45157
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.54510 0.05569 9.789 < 2e-16 ***
Income 0.28060 0.04744 5.915 1.58e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6026 on 185 degrees of freedom
Multiple R-squared: 0.159, Adjusted R-squared: 0.1545
F-statistic: 34.98 on 1 and 185 DF, p-value: 1.5774e-08
残差样本:
> residuals(fit_lm)
# A tsibble: 187 x 3 [1Q]
# Key: .model [1]
.model Time .resid
<chr> <qtr> <dbl>
1 TSLM(Consumption ~ Income) 1970 Q1 -0.202
2 TSLM(Consumption ~ Income) 1970 Q2 -0.413
3 TSLM(Consumption ~ Income) 1970 Q3 -0.104
4 TSLM(Consumption ~ Income) 1970 Q4 -0.748
5 TSLM(Consumption ~ Income) 1971 Q1 0.795
6 TSLM(Consumption ~ Income) 1971 Q2 -0.0392
7 TSLM(Consumption ~ Income) 1971 Q3 0.100
8 TSLM(Consumption ~ Income) 1971 Q4 0.778
9 TSLM(Consumption ~ Income) 1972 Q1 0.640
10 TSLM(Consumption ~ Income) 1972 Q2 1.06
# ... with 177 more rows
现在我们假设回归中的误差包含自相关,因此我们使用具有ARIMA误差的回归模型
fit_reg_arima_errs <- us_change %>% model(ARIMA(Consumption ~ Income))
模型报告:
> report(fit_reg_arima_errs)
Series: Consumption
Model: LM w/ ARIMA(1,0,2) errors
Coefficients:
ar1 ma1 ma2 Income intercept
0.6922 -0.5758 0.1984 0.2028 0.5990
s.e. 0.1159 0.1301 0.0756 0.0461 0.0884
sigma^2 estimated as 0.3219: log likelihood=-156.95
AIC=325.91 AICc=326.37 BIC=345.29
在这种情况下,我们有两种残差类型:回归模型的残差和ARIMA模型的残差。
回归残差的样本:
> regression_errors = residuals(fit_reg_arima_errs, type="regression") %>% print()
# A tsibble: 187 x 3 [1Q]
# Key: .model [1]
.model Time .resid
<chr> <qtr> <dbl>
1 ARIMA(Consumption ~ Income) 1970 Q1 0.616
2 ARIMA(Consumption ~ Income) 1970 Q2 0.460
3 ARIMA(Consumption ~ Income) 1970 Q3 0.877
4 ARIMA(Consumption ~ Income) 1970 Q4 -0.274
5 ARIMA(Consumption ~ Income) 1971 Q1 1.90
6 ARIMA(Consumption ~ Income) 1971 Q2 0.912
7 ARIMA(Consumption ~ Income) 1971 Q3 0.795
8 ARIMA(Consumption ~ Income) 1971 Q4 1.65
9 ARIMA(Consumption ~ Income) 1972 Q1 1.31
10 ARIMA(Consumption ~ Income) 1972 Q2 1.89
# ... with 177 more rows
ARIMA残差样本:
ARIMA_errors = residuals(fit_reg_arima_errs, type="innovation") %>% print()
# A tsibble: 187 x 3 [1Q]
# Key: .model [1]
.model Time .resid
<chr> <qtr> <dbl>
1 ARIMA(Consumption ~ Income) 1970 Q1 -0.167
2 ARIMA(Consumption ~ Income) 1970 Q2 -0.320
3 ARIMA(Consumption ~ Income) 1970 Q3 0.0720
4 ARIMA(Consumption ~ Income) 1970 Q4 -0.694
5 ARIMA(Consumption ~ Income) 1971 Q1 1.05
6 ARIMA(Consumption ~ Income) 1971 Q2 0.142
7 ARIMA(Consumption ~ Income) 1971 Q3 -0.0525
8 ARIMA(Consumption ~ Income) 1971 Q4 0.695
9 ARIMA(Consumption ~ Income) 1972 Q1 0.469
10 ARIMA(Consumption ~ Income) 1972 Q2 0.788
# ... with 177 more rows
后一个模型的回归残差不应该与第一个线性回归模型的残差相同(或至少相似)吗?
为什么有ARIMA误差的回归的回归残差与响应变量(Consumption)的值一致?
> us_change[,c("Time","Consumption")]
# A tsibble: 187 x 2 [1Q]
Time Consumption
<qtr> <dbl>
1 1970 Q1 0.616
2 1970 Q2 0.460
3 1970 Q3 0.877
4 1970 Q4 -0.274
5 1971 Q1 1.90
6 1971 Q2 0.912
7 1971 Q3 0.795
8 1971 Q4 1.65
9 1972 Q1 1.31
10 1972 Q2 1.89
# ... with 177 more rows
我想念什么?
示例代码摘自“预测:原理与实践。/Hyndman,Robin John; Athanasopoulos,George”。 (https://otexts.com/fpp3/)