Question

我正在尝试使用bstTree方法训练模型并打印出混淆矩阵。 adverse_effects是我的class属性。

set.seed(1234)
splitIndex <- createDataPartition(attended_num_new_bstTree$adverse_effects, p = .80, list = FALSE, times = 1)
trainSplit <- attended_num_new_bstTree[ splitIndex,]
testSplit <- attended_num_new_bstTree[-splitIndex,]

ctrl <- trainControl(method = "cv", number = 5)
model_bstTree <- train(adverse_effects ~ ., data = trainSplit, method = "bstTree", trControl = ctrl)


predictors <- names(trainSplit)[names(trainSplit) != 'adverse_effects']
pred_bstTree <- predict(model_bstTree$finalModel, testSplit[,predictors])


plot.roc(auc_bstTree)

conf_bstTree= confusionMatrix(pred_bstTree,testSplit$adverse_effects)

但是我在confusionMatrix.default（pred_bstTree，testSplit $ adverse_effects）中收到错误＆＃39;错误：数据必须包含与参考重叠的某些级别。＆＃39;

 max(pred_bstTree)
[1] 1.03385
 min(pred_bstTree)
[1] 1.011738

> unique(trainSplit$adverse_effects)
[1] 0 1
Levels: 0 1

如何解决此问题？

> head(trainSplit)
   type New_missed Therapytypename New_Diesease gender adverse_effects change_in_exposure other_reasons other_medication
5     2          1              14           13      2               0                  0             0                0
7     2          0              14           13      2               0                  0             0                0
8     2          0              14           13      2               0                  0             0                0
9     2          0              14           13      2               1                  0             0                0
11    2          1              14           13      2               0                  0             0                0
12    2          0              14           13      2               0                  0             0                0
   uvb_puva_type missed_prev_dose skintypeA skintypeB Age DoseB DoseA
5              5                1         1         1  22 3.000     0
7              5                0         1         1  22 4.320     0
8              5                0         1         1  22 4.752     0
9              5                0         1         1  22 5.000     0
11             5                1         1         1  22 5.000     0
12             5                0         1         1  22 5.000     0

Answer 1

我有类似的问题，这是指这个错误。我使用了函数confusionMatrix：

confusionMatrix(actual, predicted, cutoff = 0.5)

我收到以下错误：Error in confusionMatrix.default(actual, predicted, cutoff = 0.5) : The data must contain some levels that overlap the reference.

我查了几件事：

class(actual) - ＆gt;数字

class(predicted) - ＆gt;整数

unique(actual) - ＆gt;很多值，因为它是概率

unique(predicted) - ＆gt; 2级：0和1

我总结说，应用函数的cutoff部分存在问题，所以我之前做过：

predicted<-ifelse(predicted> 0.5,1,0)

并运行confusionMatrix函数，现在可以正常工作：

cm<- confusionMatrix(actual, predicted) cm$table

产生了正确的结果。

您的案例的一个要点，一旦您使代码工作，这可能会改善解释：您混合了混淆矩阵的输入值（根据confusionMatrix包文档），而不是：

conf_bstTree= confusionMatrix(pred_bstTree,testSplit$adverse_effects)

你应该写的：

conf_bstTree= confusionMatrix(testSplit$adverse_effects,pred_bstTree)

如上所述，一旦你找到了让它运作的方法，它很可能会帮助你解释混淆矩阵。

希望它有所帮助。

Answer 2

max（pred_bstTree）[1] 1.03385
min（pred_bstTree）[1] 1.011738

并且错误告诉所有。绘制ROC只是检查不同阈值点的影响。基于阈值舍入发生，例如0.7将转换为1（TRUE类），0.3将转为0（FALSE类）;在阈值为0.5的情况下。阈值在（0,1）

的范围内

在你的情况下，无论你的门槛如何，你都会将所有观察结果都变成TRUE类，因为即使最小预测值也大于1.（这就是为什么@phiver想知道你是在做回归而不是分类）。没有任何预测零，预测中没有任何水平。这与adverse_effects中的零水平相符，因此也是错误。

PS：如果没有发布数据，很难说出错误的根本原因

bstTree预测的混淆矩阵，错误：＆＃39;数据必须包含与参考重叠的某些级别。＆＃39;

2 个答案: