我试图运行一段代码,该代码块评估SARIMAX模型中P,D和Q的不同值的均方误差。这个确切的代码块之前对我来说还不错,我没有在任何地方进行更改,因此我只能假设问题出在数据上,但是我也以相同的方式进行了处理,所以我不知道为什么无法正常工作?
def evaluate_sarima_model(data, arima_order, s_order):
split=int(len(data) * 0.8)
train, test = data[0:split], data[split:len(data)]
past=[x for x in train]
# make predictions
predictions = list()
for i in range(len(test)):
model = sm.tsa.statespace.SARIMAX(past, order=arima_order, seasonal_order = s_order, enforce_stationarity=False, enforce_invertibility=False)
model_fit = model.fit(disp=0)
future = model_fit.forecast()[0]
predictions.append(future)
past.append(test[i])
# calculate out of sample error
error = mean_squared_error(test, predictions)
return error
def evaluate_models(dataset, p_values, d_values, q_values, P_values, D_values, Q_values):
best_score, best_cfg = float("inf"), None
for p in p_values:
for d in d_values:
for q in q_values:
for P in P_values:
for D in D_values:
for Q in Q_values:
order = (p,d,q)
s_order = (P, D, Q, 12)
try:
mse = evaluate_sarima_model(dataset, order, s_order)
if mse < best_score:
best_score, best_cfg, seas = mse, order, s_order
print('SARIMA%s %s MSE=%.3f' % (order,seas, mse))
except:
continue
return print('Best SARIMA%s %s MSE=%.3f' % (best_cfg, seas, best_score))
p_values = [1]
d_values = [1]
q_values = [1]
P_values = [x for x in range(0, 3)]
D_values = [x for x in range(0, 3)]
Q_values = [x for x in range(0, 3)]
我正在使用以下数据集:
DatetimeIndex: 175 entries, 2005-12-01 to 2020-06-01
Freq: MS
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 turnover 175 non-null int32
dtypes: int32(1)
memory usage: 7.1 KB
当我运行它时,出现以下错误:
evaluate_models(turnover_month, p_values, d_values, q_values, P_values, D_values, Q_values)
UnboundLocalError: local variable 'seas' referenced before assignment
如果我尝试使用P,D和QI的随机值来运行单个模型行,那么没有什么值得一提的,所以我的假设是问题出在第一块以及它如何处理此数据集:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
答案 0 :(得分:0)
每行结束后,很明显是问题所在,而SARIMA模型无法处理DataFrame,而是需要Series。只需在df上放一个简单的.Squeeze()