Statsmodels-TypeError:输入类型不支持ufunc'isnan'

时间:2020-07-14 14:49:04

标签: python datetime statsmodels arima

我试图运行一段代码,该代码块评估SARIMAX模型中P,D和Q的不同值的均方误差。这个确切的代码块之前对我来说还不错,我没有在任何地方进行更改,因此我只能假设问题出在数据上,但是我也以相同的方式进行了处理,所以我不知道为什么无法正常工作?


def evaluate_sarima_model(data, arima_order, s_order): 
    split=int(len(data) * 0.8) 
    train, test = data[0:split], data[split:len(data)]
    past=[x for x in train]
    # make predictions
    predictions = list()
    for i in range(len(test)):
        model = sm.tsa.statespace.SARIMAX(past, order=arima_order, seasonal_order = s_order, enforce_stationarity=False, enforce_invertibility=False)
        model_fit = model.fit(disp=0)
        future = model_fit.forecast()[0]
        predictions.append(future)
        past.append(test[i])
    # calculate out of sample error
    error = mean_squared_error(test, predictions)
    return error
      
def evaluate_models(dataset, p_values, d_values, q_values, P_values, D_values, Q_values):
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                for P in P_values:
                    for D in D_values:
                        for Q in Q_values:
                            order = (p,d,q)
                            s_order = (P, D, Q, 12)
                            try:
                                mse = evaluate_sarima_model(dataset, order, s_order)
                                if mse < best_score:
                                    best_score, best_cfg, seas = mse, order, s_order
                                print('SARIMA%s %s MSE=%.3f' % (order,seas, mse))
                            except:
                                continue
    return print('Best SARIMA%s %s MSE=%.3f' % (best_cfg, seas, best_score))
p_values = [1]
d_values = [1] 
q_values = [1] 
P_values = [x for x in range(0, 3)]
D_values = [x for x in range(0, 3)]
Q_values = [x for x in range(0, 3)] 

我正在使用以下数据集:

DatetimeIndex: 175 entries, 2005-12-01 to 2020-06-01
Freq: MS
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   turnover  175 non-null    int32
dtypes: int32(1)
memory usage: 7.1 KB 

当我运行它时,出现以下错误:


evaluate_models(turnover_month, p_values, d_values, q_values, P_values, D_values, Q_values)

UnboundLocalError: local variable 'seas' referenced before assignment 

如果我尝试使用P,D和QI的随机值来运行单个模型行,那么没有什么值得一提的,所以我的假设是问题出在第一块以及它如何处理此数据集:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

1 个答案:

答案 0 :(得分:0)

每行结束后,很明显是问题所在,而SARIMA模型无法处理DataFrame,而是需要Series。只需在df上放一个简单的.Squeeze()