我对使用外部数据进行时间序列价格预测有疑问。我有{2015,2016}中多年的时间序列数据集:
date,price,year,day,totaltx
1/1/2015 0:00,313.92,2015,1,62800
1/2/2015 0:00,314.59,2015,2,82545
1/3/2015 0:00,279.85,2015,3,82216
1/4/2015 0:00,263.63,2015,4,86991
1/5/2015 0:00,272.95,2015,5,95436
1/6/2015 0:00,285.58,2015,6,88299
1/7/2015 0:00,294.88,2015,7,91703
在这里,我的内生数据是price
,我的外生数据是totaltx
。我试图在{5,10,15}中长时间运行我的代码。我正在使用基于滚动窗口的回归。为了进行训练,我使用了[start_index:end_index]
的数据子集。我正在尝试将[end_index+horizon:end_index+horizon]
预测为先前训练有素的SARIMAX的测试数据。
import statsmodels.api as sm
import pandas as pd
import numpy as np
def arima(bitcoinPrice, window, horizon, trainLength):
start_index = 0
end_index = 0
inputNumber = bitcoinPrice.shape[0]
# sliding on time series data with 1 day step
while ((end_index) < inputNumber - 1):
end_index = start_index + trainLength
trainFeatures = bitcoinPrice[start_index:end_index]["totaltx"]
trainOutput = bitcoinPrice[start_index:end_index]["price"]
testdata=bitcoinPrice[end_index+horizon:end_index+horizon]["totaltx"]
arima = sm.tsa.statespace.SARIMAX(endog=trainOutput.values, exog=trainFeatures.values, order=(window, 0, 0), initialization='approximate_diffuse')
arima_fit = arima.fit(disp=0, start_params=[0, 0, 0, 0, 1])
predicted = arima_fit.forecast(steps=horizon, exog=np.array(testdata.values).reshape(-1,1))[0]
price = bitcoinPrice[end_index+horizon:end_index+horizon]["price"].values
print("price: " + str(price) + "predicted: " + str(predicted))
start_index = start_index + 1
trainLength=100;
for window in [3,5]:
for horizon in [5,10,15]:
bitcoinPrice = pd.read_csv("..\\prices.csv", sep=",")
predictions, prices = arima(bitcoinPrice, window, horizon, trainLength)
但是,我对SARIMAX的fit
和forecast
函数的使用感到困惑。
对于horizon = 5,我在训练SARIMAX时没有移动输出(price
列)。我应该转移输出吗?
对于预测部分,什么是外生数据?如果我要预测end_index+horizon
的索引价格,索引为totaltx
的数据的外部数据end_index+horizon
列是吗?