如果有人使用预测功能，这对我来说是一步预测。

Question

我试图预测python statsmodels ARIMA包中的时间序列，其中包含一个外生变量，但无法找出在预测步骤中插入外生变量的正确方法。有关文档，请参阅here。

import numpy as np
from scipy import stats
import pandas as pd

import statsmodels.api as sm

vals = np.random.rand(13)
ts = pd.TimeSeries(vals)
df = pd.DataFrame(ts, columns=["test"])
df.index = pd.Index(pd.date_range("2011/01/01", periods = len(vals), freq = 'Q'))

fit1 = sm.tsa.ARIMA(df, (1,0,0)).fit()
#this works fine:
pred1 = fit1.predict(start=12, end = 16)
print(pred1)

Out[32]: 
2014-03-31    0.589121
2014-06-30    0.747575
2014-09-30    0.631322
2014-12-31    0.654858
2015-03-31    0.650093
Freq: Q-DEC, dtype: float64

现在添加一个趋势外生变量

exogx = np.array(range(1,14))
#to make this easy, let's look at the ols of the trend (arima(0,0,0))
fit2 = sm.tsa.ARIMA(df, (0,0,0),exog = exogx).fit()
print(fit2.params)

const    0.555226
x1       0.013132
dtype: float64

print(fit2.fittedvalues)

2011-03-31    0.568358
2011-06-30    0.581490
2011-09-30    0.594622
2011-12-31    0.607754
2012-03-31    0.620886
2012-06-30    0.634018
2012-09-30    0.647150
2012-12-31    0.660282
2013-03-31    0.673414
2013-06-30    0.686546
2013-09-30    0.699678
2013-12-31    0.712810
2014-03-31    0.725942
Freq: Q-DEC, dtype: float64

请注意，正如我们所料，这是一条趋势线，随着时间的推移每增加一个时间点增加0.013132（当然这是随机数据，所以如果你运行它，值会有所不同，但正面或负面的趋势故事会是一样的）。因此，下一个值（时间= 14）应为0.555226 + 0.013132 * 14 = 0.739074。

#out of sample exog should be (14,15,16)
pred2 = fit2.predict(start = 12, end = 16, exog = np.array(range(13,17)))
print(pred2)
2014-03-31    0.725942
2014-06-30    0.568358
2014-09-30    0.581490
2014-12-31    0.594622
2015-03-31    0.765338
Freq: Q-DEC, dtype: float64

所以，2014-03-31正确地预测（最后一个例子），但是2014-06-30在开始时（t = 1）开始，但是注意2015-03-31（实际上，总是最后一次观察到预测，无论视野如何）都会上升t = 16（即（值 - 截距）/ beta =（0.765338 - 0.555226）/0.013132）。

为了更清楚地说明这一点，请注意当我夸大x mat的值时会发生什么

fit2.predict(start = 12, end = 16, exog = np.array(range(13,17))*10000)
Out[41]: 
2014-03-31       0.725942
2014-06-30       0.568358
2014-09-30       0.581490
2014-12-31       0.594622
2015-03-31    2101.680532
Freq: Q-DEC, dtype: float64

看到2015-03-31爆炸了，但没有考虑其他xmat值？我在这里做错了什么???

我尝试过各种各样的方式，我知道如何传递exog变量（改变维度，使exog成为矩阵，使exog只要输入加上地平线等等）。任何建议都会非常感激。

我使用的是Anaconda2.1的2.7 numpy 1.8.1 scipy 0.14.0 大熊猫0.14.0 statsmodels 0.5.0

并验证了Windows 7 64位和centos 64位的问题。

还有一些事情。我正在使用ARIMA作为ARIMA功能，以上只是为了说明（也就是说，我不能＆＃34;只是使用OLS ......＆＃34;，正如我想象的那样）。我也不能只使用R＆＃34;由于项目的限制（更一般地说，基础Spark中缺乏对R的支持）。

以下是代码中有趣的部分，如果您想自己尝试一下

import numpy as np
from scipy import stats
import pandas as pd
import statsmodels.api as sm

vals = np.random.rand(13)
ts = pd.TimeSeries(vals)
df = pd.DataFrame(ts, columns=["test"])
df.index = pd.Index(pd.date_range("2011/01/01", periods = len(vals), freq = 'Q'))

exogx = np.array(range(1,14))
fit2 = sm.tsa.ARIMA(df, (0,0,0),exog = exogx).fit()
print(fit2.fittedvalues)
pred2 = fit2.predict(start = 12, end = 16, exog = np.array(range(13,17))*10000)
print(pred2)

Answer 1

这可能最好发布在github issue tracker上。我虽然提交了ticket。

最好在那里提交一张票，如果不是，我可能会忘记它。这些天很忙。

k_ar == 0的特殊情况逻辑中存在错误。应该修复。如果你能够/不能给那个补丁一个旋转，请告诉我。如果没有，我可以做一些更严格的测试并合并它。

火花上的Statsmodels？我很好奇。

Answer 2

在拟合fit2时你已经提到了exog变量，所以不需要重复它：

exogx = np.array(range(1,5)) # I think you will need 4 exegeneous variables to perform an ARIMAX(0,0,0) since you want out of sample forecast with 4 steps ahead
fit2 = sm.tsa.ARIMA(df, (0,0,0),exog = exogx).fit()
# if you want to do an out-of-sample-forecast use fit2.forecast(steps) instead
#I would do this
pred = fit2.forecast(steps = 4)
fcst_index = pd.date_range(start = df.shift(1,'10T').index[-1]  , periods = 4, freq = '10T')
fcst_serie = pd.Series(data = pred1[0], index = fcst_index)
print fcst_serie

希望它会有所帮助！这是一篇很棒的文章。我之前从未尝试过ARIMA上的外部变量，但论文说它与你使用它的领域无关（如果需要可以搜索论文，或者你可以谷歌搜索）

Answer 3

如果有人使用预测功能，这对我来说是一步预测。

历史是训练数组

exog 是外部变量数组

Y_exog_test 是样本对应的外部变量。将其更改为ARIMAX，它应该可以正常工作

model = sm.tsa.statespace.SARIMAX(history, trend='c', order=(1,1,1),seasonal_order=(0,1,0,24),exog=yexog)

model_fit = model.fit()

predicted = model_fit.forecast(step=1,exog=[[Y_exog_test]], dynamic=True)

Python ARIMA外生变量样本外

3 个答案:

如果有人使用预测功能，这对我来说是一步预测。