Question

我正在尝试使用XGBoost进行二进制分类，而作为一个新手，出现了问题。

首先，我训练了模型“ fit”：

fit <- xgboost(
    data = dtrain #as.matrix(dat[,predictors])
    , label = label 
    #, eta = 0.1                        # step size shrinkage 
    #, max_depth = 25                   # maximum depth of tree 
    , nround=100
    #, subsample = 0.5
    #, colsample_bytree = 0.5           # part of data instances to grow tree
    #, seed = 1
    , eval_metric = "merror"        # or "mlogloss" - evaluation metric 
    , objective = "binary:logistic" #we will train a binary classification model using logistic regression for classification; anoter options: "multi:softprob", "multi:softmax" = multi class classification
    , num_class = 2                 # Number of classes in the dependent variable.
    #, nthread = 3                  # number of threads to be used 
    #, silent = 1
    #, prediction=T
)

然后我尝试使用该模型来预测新测试数据的标签。frame：预测=预测（拟合，as.matrix（test））打印（str（预测））

结果，我得到的单概率值是测试数据中的2倍。

num [1：62210] 0.0567 0.0455 0.023 0.0565 0.0642 ...

我读到，由于我使用二进制分类，因此对于测试data.frame中的每一行，我都会得到2个概率：label1和label2。但是如何与我的data.frame“测试”一起加入那个预测列表（或者那个预测对象的类型是什么？）“预测”并获得最高概率的预测呢？我试图重新整理“预测”和“测试”，但是在合并的data.frame中得到了62k行（而不是最初的“ test”中的31k）。请告诉我，如何获取每一行的预测？

第二个问题：当我在“预测”中获得“测试” data.frame中每一行的2个概率（对于label1和label2）时，我期望这两个值的总和应为1。但是作为1个测试行的结果，我得到了2个小值： 0.0455073267221451 0.0621210783720016 他们的总和远小于1 ...为什么会这样？

请为我解释这两件事。我尝试过，但没有找到清晰解释的相关主题...

Answer 1

您首先需要创建测试集，即一个矩阵，其中的训练部分使用了 p 列，而没有“结果”变量（ y 模型）。

保留测试集标签的向量as.numeric（事实）。

然后这只是几个说明。我建议为caret函数使用confusionMatrix。

library(caret)
library(xgboost)

test_matrix <- data.matrix(test[, -"outcome")]) # your test matrix (without the labels)
test_labels <- as.numeric(test$outcome) # the test labels
xgb_pred <- predict(fit, test_matrix) # this will give you just one probability (it will be a simple vector)
xgb_pred_class <- as.numeric(xgb_pred > 0.50) # to get your predicted labels 
# keep in mind that 0.50 is a threshold that can be modified.

confusionMatrix(as.factor(xgb_pred_class), as.factor(test_labels))
# this will get your confusion Matrix

R脚本：用于二进制分类的xgboost-如何获取预测的标签？

1 个答案: