全部, 我试图在R中使用ARIMA模型来识别基于状态的设备维护。我在数据帧(dd)中有两个标签,每个标签代表唯一的设备,SQL查询结构如下:
tag time value
1: GO.DIBTWS003_BATT_VOLT 2015-05-01 00:00:00 8.600000
18626 GO.MIPLES004_BATT_AVE 2015-08-06 00:00:00 7.700000
我想使用Auto.ARIMA功能来帮助识别两个标签的最佳模型,如下所示:
temp <- setDT(dd)[, list(AR = list(auto.arima(dd$value))), by = tag]
但是,结果似乎不正确。引用Temp [[2]],得到以下结果:
Series: dd$value
ARIMA(4,1,1)
Coefficients:
ar1 ar2 ar3 ar4 ma1
-0.0026 -0.0661 -0.0329 -0.0190 -0.6677
s.e. 0.0207 0.0148 0.0120 0.0101 0.0195
sigma^2 estimated as 0.0003371: log likelihood=48026.14
AIC=-96040.28 AICc=-96040.28 BIC=-95993.29
[[2]]
Series: dd$value
ARIMA(4,1,1)
Coefficients:
ar1 ar2 ar3 ar4 ma1
-0.0026 -0.0661 -0.0329 -0.0190 -0.6677
s.e. 0.0207 0.0148 0.0120 0.0101 0.0195
sigma^2 estimated as 0.0003371: log likelihood=48026.14
AIC=-96040.28 AICc=-96040.28 BIC=-95993.29
每个标签的结果都是相同的。单独查询标签会导致不同的系数:
Series: dd$value
ARIMA(3,1,4) with drift
Coefficients:
ar1 ar2 ar3 ma1 ma2 ma3 ma4 drift
0.0540 0.8200 -0.1115 -0.4868 -1.0131 0.5252 0.0924 -1e-04
s.e. 0.0681 0.0243 0.0520 0.0678 0.0342 0.0605 0.0330 1e-04
sigma^2 estimated as 0.0002066: log likelihood=26292.23
AIC=-52566.47 AICc=-52566.45 BIC=-52502.21
AND
Series: dd$value
ARIMA(0,1,1)
Coefficients:
ma1
-0.8135
s.e. 0.0062
sigma^2 estimated as 0.0004347: log likelihood=22828.25
AIC=-45652.5 AICc=-45652.5 BIC=-45638.22
我是R的新手。有人可以解释为什么会这样吗?
编辑:这是一个可重复性最低的例子:
tag time value
1 GO.DIBTWS003_BATT_VOLT 2015-08-05 04:00:00 8.51
2 GO.DIBTWS003_BATT_VOLT 2015-08-05 08:00:00 8.51
3 GO.DIBTWS003_BATT_VOLT 2015-08-05 08:15:00 8.46
4 GO.DIBTWS003_BATT_VOLT 2015-08-05 08:30:00 8.51
5 GO.MIPLES004_BATT_AVE 2015-08-05 07:00:00 7.70
6 GO.MIPLES004_BATT_AVE 2015-08-05 08:30:00 7.70
7 GO.MIPLES004_BATT_AVE 2015-08-05 08:45:00 7.59
8 GO.MIPLES004_BATT_AVE 2015-08-05 09:00:00 7.66
9 GO.MIPLES004_BATT_AVE 2015-08-05 09:15:00 7.72
10 GO.MIPLES004_BATT_AVE 2015-08-05 09:30:00 7.72
11 GO.MIPLES004_BATT_AVE 2015-08-05 09:45:00 7.73
应用temp <- setDT(dd)[, list(AR = list(auto.arima(dd$value))), by = tag]
导致以下结果:
> temp[[2]]
[[1]]
Series: dd$value
ARIMA(0,1,0)
sigma^2 estimated as 0.06818: log likelihood=-0.76
AIC=3.52 AICc=4.02 BIC=3.83
[[2]]
Series: dd$value
ARIMA(0,1,0)
sigma^2 estimated as 0.06818: log likelihood=-0.76
AIC=3.52 AICc=4.02 BIC=3.83
应用相同的操作,这次到单个标签会产生以下结果:
tag time value
1 GO.MIPLES004_BATT_AVE 2015-08-05 07:00:00 7.70
2 GO.MIPLES004_BATT_AVE 2015-08-05 08:30:00 7.70
3 GO.MIPLES004_BATT_AVE 2015-08-05 08:45:00 7.59
4 GO.MIPLES004_BATT_AVE 2015-08-05 09:00:00 7.66
5 GO.MIPLES004_BATT_AVE 2015-08-05 09:15:00 7.72
6 GO.MIPLES004_BATT_AVE 2015-08-05 09:30:00 7.72
7 GO.MIPLES004_BATT_AVE 2015-08-05 09:45:00 7.73
Series: dd$value
ARIMA(0,0,0) with non-zero mean
Coefficients:
intercept
7.6886
s.e. 0.0172
sigma^2 estimated as 0.002069: log likelihood=11.7
AIC=-19.4 AICc=-16.4 BIC=-19.51
相反,另一个标签分别使用相同的脚本运行:
tag time value
1 GO.DIBTWS003_BATT_VOLT 2015-08-05 04:00:00 8.51
2 GO.DIBTWS003_BATT_VOLT 2015-08-05 08:00:00 8.51
3 GO.DIBTWS003_BATT_VOLT 2015-08-05 08:15:00 8.46
4 GO.DIBTWS003_BATT_VOLT 2015-08-05 08:30:00 8.51
导致:
> temp[[2]]
[[1]]
Series: dd$value
ARIMA(0,0,0) with non-zero mean
Coefficients:
intercept
8.4975
s.e. 0.0108
sigma^2 estimated as 0.0004688: log likelihood=9.66
AIC=-15.31 AICc=-3.31 BIC=-16.54