R中XGBoost中的混淆矩阵的缺失部分

时间:2019-09-26 12:58:34

标签: r xgboost confusion-matrix

我正在尝试从我的XGBoost获取一个混淆矩阵并计算准确性。但是,我的混淆矩阵不完整,错过了所有错误区域,看起来像这样:

y_pred   0   1
  TRUE 526 482

因此,我无法计算准确性。这是我的代码:

# Splitting the dataset into the training set and test set
dataset$Good.Bad.Stock = factor(dataset$Good.Bad.Stock, levels = c(0,1))
training_set = dataset[1:2740,]
test_set = dataset[2741:3748,]
data = as.factor(as.character(training_set$Good.Bad.Stock))
data = replace(training_set$Good.Bad.Stock, is.na(training_set$Good.Bad.Stock), 0)
data

# Fitting XGBoost to the Training set
classifier_XGB = xgboost(data = as.matrix(training_set[-63]), 
                     label = data, 
                     nrounds = 15,                      
                     objective = "binary:logistic")

# Predicting the Test set results
pred_data=as.matrix(test_set[-63])
y_pred = predict(classifier_XGB, pred_data)
y_pred = (y_pred > 0.5)

# Making the Confusion Matrix
cm_XGB = table(y_pred, test_set$Good.Bad.Stock)
cm_XGB

# Evaluate Model
accuracy_XGB = (cm_XGB[1,1] + cm_XGB[2,2]) / (cm_XGB[1,1] + cm_XGB[2,2] + cm_XGB[1,2] + cm_XGB[2,1])
print(accuracy_XGB)

谢谢您的帮助!

1 个答案:

答案 0 :(得分:0)

我没有运行代码,但我想知道问题出在哪里:

y_pred =(y_pred> 0.5)

只需在执行此操作之前先打印y_pred,就可能会看到1s向量或概率大于0.5。

这可能是由于配置模型不正确(有关xgb参数的更多信息)或数据集高度不平衡(在测试集中似乎没有引起)引起的。

编辑: 当然,您必须确保将响应变量键入为factor。另外,您应该将目标函数设置为二进制。正如我所说,我强烈建议您继续阅读有关xgb的基本文章。 https://www.analyticsvidhya.com/blog/2016/01/xgboost-algorithm-easy-steps/ https://cran.r-project.org/web/packages/xgboost/vignettes/discoverYourData.html