'尽管使用频率重新索引,但无法为ARIMA模型添加没有频率的Timestamp的积分值'错误

时间:2017-07-12 07:23:56

标签: python pandas statsmodels arima

我正在尝试使用此系列的ARIMA模型进行时间序列预测:

1960-01-01    12.7
1961-01-01    12.1
1962-01-01    12.7
1963-01-01    12.8
1964-01-01    12.3
1965-01-01    13.0
1966-01-01    12.5
1967-01-01    12.9
1968-01-01    12.9
1969-01-01    13.3
1970-01-01    13.2
1971-01-01    13.0
1972-01-01    12.6
1973-01-01    12.2
1974-01-01    12.4
1975-01-01    12.7
1976-01-01    12.6
1977-01-01    12.2
1978-01-01    12.5
1979-01-01    12.2
1980-01-01    12.2
1981-01-01    12.2
1982-01-01    12.1
1983-01-01    12.3
1984-01-01    11.7
1985-01-01    11.8
1986-01-01    11.5
1987-01-01    11.2
1988-01-01    11.0
1989-01-01    10.9
1990-01-01    10.8
1991-01-01    10.8
1992-01-01    10.6
1993-01-01    10.4
1994-01-01    10.2
1995-01-01    10.2
1996-01-01    10.2
1997-01-01    10.0
1998-01-01     9.8
1999-01-01     9.8
2000-01-01     9.6
2001-01-01     9.3
2002-01-01     9.4
2003-01-01     9.5
2004-01-01     9.1
2005-01-01     9.1
2006-01-01     9.0
2007-01-01     9.0
2008-01-01     9.0
2009-01-01     9.3
2010-01-01     9.2
2011-01-01     9.1
2012-01-01     9.4
2013-01-01     9.4
2014-01-01     9.2
2015-01-01     9.6
Name: Death rate, crude (per 1,000 people), dtype: float64

我使用以下代码生成不同的(p,d,q)值然后尝试每个值并获得相应的AIC,然后选择与最小AIC相关的值。然后在预测中使用此(p,d,q)值。

import datetime
import warnings
import itertools
from sklearn.metrics import mean_squared_error as mse

def MAPE (A, F):
    import numpy as np
    n = len(A)
    Av = np.array(A.values)
    Fv = np.array(F.values)
    mape = np.mean(np.abs((Av-Fv)/Av))*100
    mape = np.around(mape, decimals= 2)
    return mape

# Generate pdq combinations
p= d= q= range(7)
pdq = list(itertools.product(p, d, q))

# Choose min pdq corresponding to min AIC
warnings.filterwarnings('ignore')
param_aic = {}
for param in pdq:
    try:
        mod = sm.tsa.ARIMA(cmortS, order= param)
        result = mod.fit()
        param_aic[param] = result.aic
    except:
        continue

min_aic = min(param_aic.values())
min_param = ()
for pm, aic in param_aic.items():
    if aic == min_aic:
        min_param = pm

# Run the model with min pdq
model = sm.tsa.ARIMA(cmortS, order= min_param)
results = model.fit()

#Forecast validation
tp = ''
if min_param[1] > 0:
    tp = 'levels'
else:
    tp = 'linear'

train_sz = int(len(cmortS)*0.66)
train = cmortS[:train_sz]
tst = cmortS[train_sz:]
pred_strt = tst.index[0]
tst_pred = results.predict(start= pred_strt, typ= tp)
mserror = mse(tst, tst_pred)
mserror = np.round(mserror, decimals= 5)
mp = MAPE(tst, tst_pred)
print('Model order: {}, MAPE: {}%, mse: {}'.format(min_param, mp, mserror)) 

# Prediction
end_yr = '2050'
end_dt = pd.to_datetime(end_yr, format= '%Y')
strt_dt = pd.to_datetime('2014', format= '%Y')
Var_pred = results.predict(start= strt_dt, end= end_dt, typ = tp)

Var_pred

我运行时遇到以下错误:

ValueError: Cannot add integral value to Timestamp without freq.

虽然我使用freq ='AS'的日期范围重新编制了系列索引,但我仍然得到同样的错误。

我该如何解决?

1 个答案:

答案 0 :(得分:1)

将代码的最后几行更改为此格式应解决错误消息:

# Prediction
strt_date = pd.to_datetime('2014-01-01 01:00:00')
end_date = pd.to_datetime('2050-01-01 01:00:00')
Var_pred = results.predict(start = strt_date, end = end_date, typ = tp) 
Var_pred