我在e1071包中使用SVM进行二进制分类。 我同时使用概率属性和SVM预测分类来比较结果。令我困惑的是,预测函数的预测分类(0或1)似乎与属性中列出的实际概率不一致。对于级别1的某些非常高的概率,SVM分类为0级,对于级别1的某些低概率,SVM分类为级别1.
这是一个示例代码和结果
svm_model <- svm(as.factor(CHURNED) ~ .
, scale = FALSE
, data = train
, cost = 1
, gamma = 0.1
, kernel = "radial"
, probability = TRUE
)
test$Pred_Class <- predict(svm_model, test, probability = TRUE)
test$Pred_Prob <- attr(test$Pred_Class, "probabilities")[,1]
结果如下:(行的位置不同,以查看各种示例)
CHURNED:正在预测的响应变量
Pred_class:是SVM的预测类
Pred_Prob:是SVM进行分类的预测概率吗?
CHURNED Pred_Class Pred_Prob
1 0 0.03968526 # --> makes sense
1 0 0.03968526
1 0 0.07033222
1 0 0.11711195
1 0 0.12477983
1 0 0.12827296
1 0 0.12829345
1 0 0.12829345
1 0 0.12829345
1 0 0.12829444
1 0 0.12829927
1 0 0.12829927
1 0 0.12831169
1 0 0.12831169
1 0 0.12831428
1 1 0.13053475 # --> doesn't make sense. Prob is less than 0.5
1 1 0.13053475
1 1 0.13053475
1 1 0.1305348
1 1 0.1305348
1 1 0.1305348
1 1 0.1690807
1 1 0.2206993
1 1 0.2321171
0 0 0.998289 # --> doesn't make sense. Prob is almost 1!
0 0 0.9982887
0 0 0.993133
0 0 0.9898889
1 0 0.9849951
0 0 0.9849951
1 0 0.546427
0 0 0.5440994 # --> doesn't make sense. Prob is more than 0.5
0 0 0.5437889
1 0 0.5417848
0 0 0.5284112
0 0 0.5252177
0 1 0.5180776 # --> makes sense but is not consistent with above example
0 1 0.5180704
1 1 0.5180436
1 1 0.5180436
0 1 0.518043
这个结果对我来说根本没有意义。预测的类别和预测的概率不匹配。我已经检查过以确保我在&#34;概率&#34;中引用了正确的列。属性矩阵:
test$Pred_Class
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[98] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
attr(,"probabilities")
1 0
6442 0.2369796 0.7630204
6443 0.2520246 0.7479754
6513 0.2322581 0.7677419
6801 0.2309437 0.7690563
6802 0.2244768 0.7755232
6954 0.2322450 0.7677550
6968 0.2537544 0.7462456
6989 0.2352477 0.7647523
7072 0.2322308 0.7677692
...
...
...
也许我在错误地解释概率?