评估和跟踪/绘制TimeSeriesSplit

时间:2019-03-11 16:13:47

标签: python datetime xgboost

我有时间序列数据。因此,我为XGBRegressor使用了带有3个拆分的TimeSeriesSplit,请参见以下代码。

    from sklearn.model_selection import TimeSeriesSplit
    from xgboost.sklearn import XGBRegressor
    from sklearn.metrics import mean_squared_error
    import math

    tscv = TimeSeriesSplit(n_splits=3)
    print(tscv)

    X = data.iloc[:, :-1].values
    y = data.iloc[:, -1].values 

    for train_index, test_index in tscv.split(X):
    print("TRAIN:", train_index, "TEST:", test_index)

    X_test = X[test_index]
    X_train = X[train_index]

    y_test = y[test_index]
    y_train = y[train_index]

    model = XGBRegressor()
    model.fit(X_train, y_train)
    y_pred_test = model.predict(X_test)
    rmse = (math.sqrt(mean_squared_error(y_test, y_pred_test)))
    print(rmse)

我的问题是:

1)rmse结果已经是这三个值的平均值吗?如果只是一个结果的两倍,如何显示其他两个结果来计算平均值?

2)如何跟踪每折的验证均方根值以绘制训练/测试曲线以检查过度/欠拟合?没有TimeSeriesSplit我定义

model.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)],
          eval_metric=['rmse'],verbose=True)

所以我得到了

[1]     validation_0-rmse:0.565858      validation_1-rmse:0.574236
[2]     validation_0-rmse:0.550307      validation_1-rmse:0.567077
[3]     validation_0-rmse:0.53824       validation_1-rmse:0.56323
               ...                            ...

0 个答案:

没有答案