我正在R中工作,我正在尝试为我想运行的xgboost模型确定最佳的超参数。我有一个包含约700个变量(一些数字,另一些是热编码的)和约25,000个观测值的数据集。我正在尝试预测每个观察值是大(预测= 1)还是小(预测= 0)。问题是,当我运行xgb.cv
函数时,train-error
和test-error
在每次迭代后都不会改变。下面是我的代码和随后的打印输出。谁能解释为什么错误保持不变?非常感谢!
特定的R代码:
dtrain <- xgb.DMatrix(data = pred[train,], label = resp[train])
xgb.cv(data = dtrain,
params = list(objective = "binary:logistic",
eta = 0.01,
max_depth = 10,
min_child_weight = 20,
colsample_bytree = 0.2),
nfold = 5,
nrounds = 100,
verbose = TRUE,
early_stopping_rounds = 8,
maximize = FALSE)
控制台打印输出:
[1] train-error:0.014422+0.000491 test-error:0.014422+0.001965
Multiple eval metrics are present. Will use test_error for early stopping.
Will train until test_error hasn't improved in 8 rounds.
[2] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[3] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[4] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[5] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[6] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[7] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[8] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[9] train-error:0.014422+0.000491 test-error:0.014422+0.001965
Stopping. Best iteration:
[1] train-error:0.014422+0.000491 test-error:0.014422+0.001965
再次感谢您的帮助!
编辑/更新-我尝试了以下代码,输出如下:
使用多个参数的新代码:
param <- list(objective = "binary:logistic",
eta = c(0.01, 0.05, 0.1, 0.5, 1),
max_depth = 10,
min_child_weight = 20,
colsample_bytree = c(0.1, 0.2, 0.5, 1))
cv <- xgb.cv(data = dtrain,
params = param,
nfold = 5,
nrounds = 100,
verbose = TRUE,
early_stopping_rounds = 8,
maximize = FALSE)
控制台输出:
[1] train-error:0.014422+0.000189 test-error:0.014422+0.000756
Multiple eval metrics are present. Will use test_error for early stopping.
Will train until test_error hasn't improved in 8 rounds.
[2] train-error:0.014422+0.000189 test-error:0.014422+0.000756
[3] train-error:0.014422+0.000189 test-error:0.014422+0.000756
[4] train-error:0.014422+0.000189 test-error:0.014422+0.000756
[5] train-error:0.014422+0.000189 test-error:0.014422+0.000756
[6] train-error:0.014422+0.000189 test-error:0.014422+0.000756
[7] train-error:0.014422+0.000189 test-error:0.014422+0.000756
[8] train-error:0.014422+0.000189 test-error:0.014422+0.000756
[9] train-error:0.014422+0.000189 test-error:0.014422+0.000756
Stopping. Best iteration:
[1] train-error:0.014422+0.000189 test-error:0.014422+0.000756