我尝试使用xgboost.cv()比较cv预测并手动检查xgboost.cv()预测结果cv $ pred。但是结果却有所不同。这是针对R 3.6.1和Windows 7上的XGBoost 0.90.0.2的。我试图通过库AUC和Metrics上的auc度量对其进行检查。
库(xgboost) 图书馆(AUC) 库(mlbench) 图书馆(扫帚) 图书馆(指标)
data(Sonar)
xgb.train.data <- xgb.DMatrix(as.matrix(Sonar[,1:60]), label = as.numeric(Sonar$Class)-1)
param <- list(objective = "binary:logistic")
model.cv <- xgb.cv(param = param,
data = xgb.train.data,
nrounds = 50,
early_stopping_rounds = 10,
nfold = 3,
prediction = TRUE,
eval_metric = "auc")
model.cv$evaluation_log[model.cv$best_iteration,]
iter train_auc_mean train_auc_std test_auc_mean test_auc_std
1: 50 1 0 0.911689 0.03812221
model.cv的输出显示test_auc_mean约为0.91
#########################################
#Try to manually check the by calculate each cv fold results
#########################################
z <- lapply(model.cv$folds, function(x){
pred <- as.factor(ifelse(model.cv$pred[x] > 0.5,1,0))
true <- as.factor((as.numeric(Sonar$Class)-1)[x])
index <- x
out <- data.frame(pred, true, index)
out
})
names(z) <- paste("folds", 1:3, sep = "_")
z %>%
bind_rows(.id = "id") %>%
group_by(id) %>%
summarise(auroc = roc(true, pred) %>%
auc())
z %>%
bind_rows(.id = "id") %>%
group_by(id) %>%
summarise(auroc = roc(true, pred) %>%
auc()) %>%
pull(auroc) %>%
mean
# A tibble: 3 x 2
# id auroc
# <chr> <dbl>
#1 folds_1 0.792
#2 folds_2 0.769
#3 folds_3 0.874
dat = data.frame(model.cv$pred,(as.numeric(Sonar$Class)-1))
dat$prediksi = ifelse(dat[,1]>0.5,1,0)
for(i in 1:10){
print(auc(dat[model.cv$folds[[i]],2],dat[model.cv$folds[[i]],3]));
}
#[1] 0.7701681
#[1] 0.7664141
#[1] 0.8851976
如上所示,cv的结果记录为0.91,这与手动检查不同。我想念什么吗?