与预测不同的xgboost交叉验证日志结果

时间:2019-08-18 16:24:11

标签: r xgboost

我尝试使用xgboost.cv()比较cv预测并手动检查xgboost.cv()预测结果cv $ pred。但是结果却有所不同。这是针对R 3.6.1和Windows 7上的XGBoost 0.90.0.2的。我试图通过库AUC和Metrics上的auc度量对其进行检查。

库(xgboost)         图书馆(AUC)         库(mlbench)         图书馆(扫帚)         图书馆(指标)

    data(Sonar)

    xgb.train.data <- xgb.DMatrix(as.matrix(Sonar[,1:60]), label = as.numeric(Sonar$Class)-1)
    param <- list(objective = "binary:logistic")

    model.cv <- xgb.cv(param = param,
                       data = xgb.train.data,
                       nrounds = 50,
                       early_stopping_rounds = 10,
                       nfold = 3,
                       prediction = TRUE,
                       eval_metric = "auc")

model.cv$evaluation_log[model.cv$best_iteration,]
   iter train_auc_mean train_auc_std test_auc_mean test_auc_std
1:   50              1             0      0.911689   0.03812221   

model.cv的输出显示test_auc_mean约为0.91

        #########################################
        #Try to manually check the by calculate each cv fold results
        #########################################

        z <- lapply(model.cv$folds, function(x){
          pred <- as.factor(ifelse(model.cv$pred[x] > 0.5,1,0))
          true <-  as.factor((as.numeric(Sonar$Class)-1)[x])
          index <- x
          out <- data.frame(pred, true, index)
          out
        })

        names(z) <- paste("folds", 1:3, sep = "_")

        z %>%
          bind_rows(.id = "id") %>%
          group_by(id) %>%
          summarise(auroc = roc(true, pred) %>%
                   auc())

        z %>%
          bind_rows(.id = "id") %>%
          group_by(id) %>%
          summarise(auroc = roc(true, pred) %>%
                   auc()) %>%
          pull(auroc) %>%
          mean   
# A tibble: 3 x 2
#  id      auroc
#  <chr>   <dbl>
#1 folds_1 0.792
#2 folds_2 0.769
#3 folds_3 0.874

        dat = data.frame(model.cv$pred,(as.numeric(Sonar$Class)-1))
        dat$prediksi = ifelse(dat[,1]>0.5,1,0)
        for(i in 1:10){
            print(auc(dat[model.cv$folds[[i]],2],dat[model.cv$folds[[i]],3]));
        }
#[1] 0.7701681
#[1] 0.7664141
#[1] 0.8851976

如上所示,cv的结果记录为0.91,这与手动检查不同。我想念什么吗?

0 个答案:

没有答案
相关问题