Question

我使用svm对e1071进行了分类。目标是通过type中的所有其他变量预测dtm。

 dtm[140:145] %>% str()
 'data.frame':  385 obs. of  6 variables:
 $ think   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ actually: num  0 0 0 0 0 0 0 0 0 0 ...
 $ comes   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ able    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ hours   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ type    : Factor w/ 4 levels "-1","0","1","9": 4 3 3 3 4 1 4 4 4 3 ...

为了训练/测试模型，我使用了10倍交叉验证。

model <- svm(type~., dtm, cross = 10, gamma = 0.5, cost = 1)
summary(model)

Call:
svm(formula = type ~ ., data = dtm, cross = 10, gamma = 0.5, cost = 1)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 
     gamma:  0.5 

Number of Support Vectors:  385

 ( 193 134 41 17 )


Number of Classes:  4 

Levels: 
 -1 0 1 9

10-fold cross-validation on training data:

Total Accuracy: 50.12987 
Single Accuracies:
 52.63158 51.28205 52.63158 43.58974 60.52632 43.58974 57.89474 48.71795 
 39.47368 51.28205

我的问题是如何为结果生成混淆矩阵？我需要在model或table()中添加哪些confusionMatrix()列来获取矩阵？

Answer 1

据我所知，在进行交叉验证时，没有方法可以访问库e1071中的折叠预测。

一种简单的方法：

一些数据：

library(e1071)
library(mlbench)
data(Sonar)

生成折叠：

k <- 10
folds <- sample(rep(1:k, length.out = nrow(Sonar)), nrow(Sonar))

运行模型：

z <- lapply(1:k, function(x){
  model <- svm(Class~., Sonar[folds != x, ], gamma = 0.5, cost = 1, probability = T)
  pred <- predict(model, Sonar[folds == x, ])
  true <- Sonar$Class[folds == x]
  return(data.frame(pred = pred, true = true))
})

为所有遗漏样本生成混淆矩阵：

z1 <- do.call(rbind, z)
caret::confusionMatrix(z1$pred, z1$true)

为每个人生成：

lapply(z, function(x){
  caret::confusionMatrix(x$pred, x$true)
})

重复性在折叠创建之前设置种子。

一般情况下，如果您这样做，通常会选择更高级别的库，例如mlr或者插入符号。

Answer 2

假设您要从名为 dtm 的数据集创建预测和实际值的混淆矩阵，其中您的目标变量名为 type 。首先，您必须使用以下方法根据模型预测值：

prediction <- predict(model, dtm)

然后您可以使用代码创建混淆矩阵：

library(caret)
confusionMatrix(prediction, dtm$type, dnn = c("Prediction", "Reference"))

希望它足够清楚。

在e1071中为CV结果生成svm的混淆矩阵

2 个答案: