我正在尝试使用此系列的ARIMA模型进行时间序列预测:
1960-01-01 12.7
1961-01-01 12.1
1962-01-01 12.7
1963-01-01 12.8
1964-01-01 12.3
1965-01-01 13.0
1966-01-01 12.5
1967-01-01 12.9
1968-01-01 12.9
1969-01-01 13.3
1970-01-01 13.2
1971-01-01 13.0
1972-01-01 12.6
1973-01-01 12.2
1974-01-01 12.4
1975-01-01 12.7
1976-01-01 12.6
1977-01-01 12.2
1978-01-01 12.5
1979-01-01 12.2
1980-01-01 12.2
1981-01-01 12.2
1982-01-01 12.1
1983-01-01 12.3
1984-01-01 11.7
1985-01-01 11.8
1986-01-01 11.5
1987-01-01 11.2
1988-01-01 11.0
1989-01-01 10.9
1990-01-01 10.8
1991-01-01 10.8
1992-01-01 10.6
1993-01-01 10.4
1994-01-01 10.2
1995-01-01 10.2
1996-01-01 10.2
1997-01-01 10.0
1998-01-01 9.8
1999-01-01 9.8
2000-01-01 9.6
2001-01-01 9.3
2002-01-01 9.4
2003-01-01 9.5
2004-01-01 9.1
2005-01-01 9.1
2006-01-01 9.0
2007-01-01 9.0
2008-01-01 9.0
2009-01-01 9.3
2010-01-01 9.2
2011-01-01 9.1
2012-01-01 9.4
2013-01-01 9.4
2014-01-01 9.2
2015-01-01 9.6
Name: Death rate, crude (per 1,000 people), dtype: float64
我使用以下代码生成不同的(p,d,q)值然后尝试每个值并获得相应的AIC,然后选择与最小AIC相关的值。然后在预测中使用此(p,d,q)值。
import datetime
import warnings
import itertools
from sklearn.metrics import mean_squared_error as mse
def MAPE (A, F):
import numpy as np
n = len(A)
Av = np.array(A.values)
Fv = np.array(F.values)
mape = np.mean(np.abs((Av-Fv)/Av))*100
mape = np.around(mape, decimals= 2)
return mape
# Generate pdq combinations
p= d= q= range(7)
pdq = list(itertools.product(p, d, q))
# Choose min pdq corresponding to min AIC
warnings.filterwarnings('ignore')
param_aic = {}
for param in pdq:
try:
mod = sm.tsa.ARIMA(cmortS, order= param)
result = mod.fit()
param_aic[param] = result.aic
except:
continue
min_aic = min(param_aic.values())
min_param = ()
for pm, aic in param_aic.items():
if aic == min_aic:
min_param = pm
# Run the model with min pdq
model = sm.tsa.ARIMA(cmortS, order= min_param)
results = model.fit()
#Forecast validation
tp = ''
if min_param[1] > 0:
tp = 'levels'
else:
tp = 'linear'
train_sz = int(len(cmortS)*0.66)
train = cmortS[:train_sz]
tst = cmortS[train_sz:]
pred_strt = tst.index[0]
tst_pred = results.predict(start= pred_strt, typ= tp)
mserror = mse(tst, tst_pred)
mserror = np.round(mserror, decimals= 5)
mp = MAPE(tst, tst_pred)
print('Model order: {}, MAPE: {}%, mse: {}'.format(min_param, mp, mserror))
# Prediction
end_yr = '2050'
end_dt = pd.to_datetime(end_yr, format= '%Y')
strt_dt = pd.to_datetime('2014', format= '%Y')
Var_pred = results.predict(start= strt_dt, end= end_dt, typ = tp)
Var_pred
我运行时遇到以下错误:
ValueError: Cannot add integral value to Timestamp without freq.
虽然我使用freq ='AS'的日期范围重新编制了系列索引,但我仍然得到同样的错误。
我该如何解决?
答案 0 :(得分:1)
将代码的最后几行更改为此格式应解决错误消息:
# Prediction
strt_date = pd.to_datetime('2014-01-01 01:00:00')
end_date = pd.to_datetime('2050-01-01 01:00:00')
Var_pred = results.predict(start = strt_date, end = end_date, typ = tp)
Var_pred