Question

我正在寻找一种解决方案，将{caret}中的confusionMatrix（）函数应用于拆分列表的特定元素。我有3个组，每组有10个Actuals和3个Preds列的观察值。

library(caret)
set.seed(10)
dat <- data.frame(Group = c(rep(1, 10), rep(2, 10), rep(3, 10)), Actual = round(runif(30, 0, 1)),
              Preds1 = round(runif(30, 0, 1)), Preds2 = round(runif(30, 0, 1)), Preds3 = round(runif(30, 0, 1)))

> dat
   Group Actual Preds1 Preds2 Preds3
1      1      1      1      0      0
2      1      0      0      0      1
3      1      0      0      0      1
4      1      1      1      0      1
...........
27     3      1      0      1      0
28     3      0      0      0      1
29     3      1      0      0      1
30     3      0      1      0      1

最终的解决方案应该由Group，每个Preds列创建混淆矩阵。我将需要实际的混淆矩阵表，但最终需要提取$ overall和$ byClass元素，最后得到类似下面的内容。

> conf_matrix
$Preds1
      Accuracy  Sensitivity  Specificity
 [1,] 0.73      0.8          0.6            
 [2,] 0.93      0.91         1              
 [3,] 0.87      0.83         1              
 [4,] 0.8       0.82         0.75
...............
[27,] 0.8       0.82         0.75           
[28,] 0.58      0.67         0.5            
[29,] 1         0.67         1              
[30,] 1         0            1

$Preds2
      Accuracy  Sensitivity  Specificity
 [1,] 0.73      0.8          0.6            
 [2,] 0.93      0.91         1              
 [3,] 0.87      0.83         1              
 [4,] 0.8       0.82         0.75    
...............
[27,] 0.8       0.82         0.75           
[28,] 0.58      0.67         0.5            
[29,] 1         0.67         1              
[30,] 1         0            1

$Preds3
...............

我已尝试过以下脚本，但在每个组中的Preds列尝试二次索引时仍然遇到问题。我相信它与我的嵌套lapply有关，以及我如何编制索引，因为当我分解代码并逐步遍历它时，这是有效的。

我也尝试使用table（）手动执行此操作，但是已经放弃了该方法，因为它不会像使用confusionMatrix（）那样给出一致的结果。

lapply(seq_along(split(dat[3:5], list(dat$Group))), function(x) {
    x_temp <- split(dat[3:5], list(dat$Group))[[x]]
    lapply(seq_along(x_temp), function(x2) {
        x_temp <- x_temp[[x2]]
        lapply(seq_along(split(dat[2], list(dat$Group))), function(y) {
            y_temp <- split(dat[2], list(dat$Group))[[y]]
            lapply(seq_along(y_temp), function(y2) {
                y_temp <- y_temp[[y2]]
                confusionMatrix(x_temp, y_temp)
            })
        })
    })
})

我可能会离开基地，所以我愿意接受所有的建议和意见。

Answer 1

我不了解最终结果，但会通过以下方式获得混淆矩阵。

library(caret)
set.seed(10)
dat <- data.frame(Group = c(rep(1, 10), rep(2, 10), rep(3, 10)), Actual = round(runif(30, 0, 1)),
                  Preds1 = round(runif(30, 0, 1)), Preds2 = round(runif(30, 0, 1)), Preds3 = round(runif(30, 0, 1)))
dat[] <- lapply(dat, as.factor)

# split by group
dats <- split(dat[,-1], dat$Group)

cm <- do.call(c, lapply(dats, function(x) {
  actual <- x[, 1]
  lapply(x[, 2:4], function(y) {
    confusionMatrix(actual, unlist(y))$table
  })
}))
cm[1:3]
$`1.Preds1`
Reference
Prediction 0 1
0 3 4
1 0 3

$`1.Preds2`
Reference
Prediction 0 1
0 4 3
1 3 0

$`1.Preds3`
Reference
Prediction 0 1
0 3 4
1 1 2

@ Brian

在链接（What's the difference between lapply and do.call in R?）中，我发现Paul Hiemstra的答案非常简单。

- lapply与map相似，do.call则不是。 lapply将函数应用于列表的所有元素，do.call调用一个函数，其中所有函数参数都在列表中。因此，对于n元素列表，lapply具有n函数调用，do.call只有one函数调用。因此do.call与lapply完全不同。

在示例中，

dats有三个要素 - 1，2和3

dats <- split(dat[,-1], dat$Group)
dats[1]
$`1`
Actual Preds1 Preds2 Preds3
1       1      1      0      0
2       0      0      0      1
3       0      0      0      1
4       1      1      0      1
5       0      0      1      0
6       0      1      1      1
7       0      1      1      0
8       0      1      0      1
9       1      1      0      1
10      0      1      0      0

下面是双循环，第一个循环应用于1，2和3，第二个循环应用于Preds1，Preds2和Preds3。因此，lapply()单独生成的列表会生成一个嵌套列表，如下所示。

lapply(dats, function(x) {
  actual <- x[, 1]
  lapply(x[, 2:4], function(y) {
    confusionMatrix(actual, unlist(y))$table
  })
})[1]
$`1`
$`1`$Preds1
Reference
Prediction 0 1
0 3 4
1 0 3

$`1`$Preds2
Reference
Prediction 0 1
0 4 3
1 3 0

$`1`$Preds3
Reference
Prediction 0 1
0 3 4
1 1 2

然而，上述内容以后不易使用，因为需要另一个双循环才能访问每个混淆矩阵。它由do.call()简化。第一个参数c是一个函数，它c(dats$ 1 $Preds1, dats$ 1 $Preds2, dats$ 1 $Preds2 ...)，因此结构可以减少为单循环访问。通常，当需要更改列表的结构时，我倾向于使用do.call()。

do.call(c, lapply(dats, function(x) {
  actual <- x[, 1]
  lapply(x[, 2:4], function(y) {
    confusionMatrix(actual, unlist(y))$table
  })
}))[1:3]
$`1.Preds1`
Reference
Prediction 0 1
0 3 4
1 0 3

$`1.Preds2`
Reference
Prediction 0 1
0 4 3
1 3 0

$`1.Preds3`
Reference
Prediction 0 1
0 3 4
1 1 2

将confusionMatrix（）应用于R

1 个答案: