sarimax的长时间预测

时间:2019-01-12 04:58:13

标签: python time-series

我对使用外部数据进行时间序列价格预测有疑问。我有{2015,2016}中多年的时间序列数据集:

date,price,year,day,totaltx
1/1/2015 0:00,313.92,2015,1,62800
1/2/2015 0:00,314.59,2015,2,82545
1/3/2015 0:00,279.85,2015,3,82216
1/4/2015 0:00,263.63,2015,4,86991
1/5/2015 0:00,272.95,2015,5,95436
1/6/2015 0:00,285.58,2015,6,88299
1/7/2015 0:00,294.88,2015,7,91703

在这里,我的内生数据是price,我的外生数据是totaltx。我试图在{5,10,15}中长时间运行我的代码。我正在使用基于滚动窗口的回归。为了进行训练,我使用了[start_index:end_index]的数据子集。我正在尝试将[end_index+horizon:end_index+horizon]预测为先前训练有素的SARIMAX的测试数据。

import statsmodels.api as sm
import pandas as pd
import numpy as np

def arima(bitcoinPrice, window, horizon, trainLength):
    start_index = 0
    end_index = 0
    inputNumber = bitcoinPrice.shape[0]
    # sliding on time series data with 1 day step
    while ((end_index) < inputNumber - 1):
        end_index = start_index + trainLength
        trainFeatures = bitcoinPrice[start_index:end_index]["totaltx"]
        trainOutput = bitcoinPrice[start_index:end_index]["price"]
        testdata=bitcoinPrice[end_index+horizon:end_index+horizon]["totaltx"]

        arima = sm.tsa.statespace.SARIMAX(endog=trainOutput.values, exog=trainFeatures.values, order=(window, 0, 0), initialization='approximate_diffuse')
        arima_fit = arima.fit(disp=0, start_params=[0, 0, 0, 0, 1])
        predicted = arima_fit.forecast(steps=horizon, exog=np.array(testdata.values).reshape(-1,1))[0]
        price = bitcoinPrice[end_index+horizon:end_index+horizon]["price"].values

        print("price: " + str(price) + "predicted: " + str(predicted))
        start_index = start_index + 1

trainLength=100;
for window in [3,5]:
    for horizon in [5,10,15]:
        bitcoinPrice = pd.read_csv("..\\prices.csv", sep=",")
        predictions, prices = arima(bitcoinPrice, window, horizon, trainLength)

但是,我对SARIMAX的fitforecast函数的使用感到困惑。

  1. 对于horizo​​n = 5,我在训练SARIMAX时没有移动输出(price列)。我应该转移输出吗?

  2. 对于预测部分,什么是外生数据?如果我要预测end_index+horizon的索引价格,索引为totaltx的数据的外部数据end_index+horizon列是吗?

0 个答案:

没有答案