Question

我尝试使用xgboost.cv（）比较cv预测并手动检查xgboost.cv（）预测结果cv $ pred。但是结果却有所不同。这是针对R 3.6.1和Windows 7上的XGBoost 0.90.0.2的。我试图通过库AUC和Metrics上的auc度量对其进行检查。

库（xgboost）图书馆（AUC）库（mlbench）图书馆（扫帚）图书馆（指标）

    data(Sonar)

    xgb.train.data <- xgb.DMatrix(as.matrix(Sonar[,1:60]), label = as.numeric(Sonar$Class)-1)
    param <- list(objective = "binary:logistic")

    model.cv <- xgb.cv(param = param,
                       data = xgb.train.data,
                       nrounds = 50,
                       early_stopping_rounds = 10,
                       nfold = 3,
                       prediction = TRUE,
                       eval_metric = "auc")

model.cv$evaluation_log[model.cv$best_iteration,]
   iter train_auc_mean train_auc_std test_auc_mean test_auc_std
1:   50              1             0      0.911689   0.03812221

model.cv的输出显示test_auc_mean约为0.91

        #########################################
        #Try to manually check the by calculate each cv fold results
        #########################################

        z <- lapply(model.cv$folds, function(x){
          pred <- as.factor(ifelse(model.cv$pred[x] > 0.5,1,0))
          true <-  as.factor((as.numeric(Sonar$Class)-1)[x])
          index <- x
          out <- data.frame(pred, true, index)
          out
        })

        names(z) <- paste("folds", 1:3, sep = "_")

        z %>%
          bind_rows(.id = "id") %>%
          group_by(id) %>%
          summarise(auroc = roc(true, pred) %>%
                   auc())

        z %>%
          bind_rows(.id = "id") %>%
          group_by(id) %>%
          summarise(auroc = roc(true, pred) %>%
                   auc()) %>%
          pull(auroc) %>%
          mean   
# A tibble: 3 x 2
#  id      auroc
#  <chr>   <dbl>
#1 folds_1 0.792
#2 folds_2 0.769
#3 folds_3 0.874

        dat = data.frame(model.cv$pred,(as.numeric(Sonar$Class)-1))
        dat$prediksi = ifelse(dat[,1]>0.5,1,0)
        for(i in 1:10){
            print(auc(dat[model.cv$folds[[i]],2],dat[model.cv$folds[[i]],3]));
        }
#[1] 0.7701681
#[1] 0.7664141
#[1] 0.8851976

如上所示，cv的结果记录为0.91，这与手动检查不同。我想念什么吗？

与预测不同的xgboost交叉验证日志结果

0 个答案: