我有时间序列数据。因此,我为XGBRegressor使用了带有3个拆分的TimeSeriesSplit,请参见以下代码。
from sklearn.model_selection import TimeSeriesSplit
from xgboost.sklearn import XGBRegressor
from sklearn.metrics import mean_squared_error
import math
tscv = TimeSeriesSplit(n_splits=3)
print(tscv)
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
for train_index, test_index in tscv.split(X):
print("TRAIN:", train_index, "TEST:", test_index)
X_test = X[test_index]
X_train = X[train_index]
y_test = y[test_index]
y_train = y[train_index]
model = XGBRegressor()
model.fit(X_train, y_train)
y_pred_test = model.predict(X_test)
rmse = (math.sqrt(mean_squared_error(y_test, y_pred_test)))
print(rmse)
我的问题是:
1)rmse结果已经是这三个值的平均值吗?如果只是一个结果的两倍,如何显示其他两个结果来计算平均值?
2)如何跟踪每折的验证均方根值以绘制训练/测试曲线以检查过度/欠拟合?没有TimeSeriesSplit我定义
model.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)],
eval_metric=['rmse'],verbose=True)
所以我得到了
[1] validation_0-rmse:0.565858 validation_1-rmse:0.574236
[2] validation_0-rmse:0.550307 validation_1-rmse:0.567077
[3] validation_0-rmse:0.53824 validation_1-rmse:0.56323
... ...