Question

对可能不太理想的标题抱歉 - 我似乎无法想出更好的标题。

说我有一个3x5矩阵，如此：

test.df <- matrix(rep(1:5, 3), nrow = 3)
test.df
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    4    2    5    3
[2,]    2    5    3    1    4
[3,]    3    1    4    2    5

我想返回每个列中最常值最大的行的索引。我可以将which.max，apply和table组合在一起，如下所示：

which.max(
    table(
        apply(test.df, 2, which.max)
        )
    )

首先，我将which.max应用于每一栏：

apply(test.df, 2, which.max)
[1] 3 2 3 1 3

然后我将table应用于生成的向量，并得到一个给定行被发现具有最大值的次数。

table(
    apply(test.df, 2, which.max)
)
1 2 3 
1 1 3

最后，我再次使用which.max来获取大多数时候值最大的行的索引。

不幸的是，我需要在大约20000个矩阵上执行上述操作，其中一些可能包含数千行。所以我想知道是否有更快和/或更优雅的解决方案。优选地利用R中的矩阵运算的能力。

非常感谢！

Answer 1

使用rowSums的此解决方案似乎提供了相当不错的加速：

test.df <- matrix(rep(1:5, 3), nrow = 3)

original = function(m) {
    which.max(
        table(
            apply(m, 2, which.max)
        )
    )
}

row_sums = function(m) {
    which.max(rowSums(apply(m, 2, function(x) {x == max(x)})))
}

library(microbenchmark)

microbenchmark(original(test.df), row_sums(test.df))

计时结果：

Unit: microseconds
              expr    min      lq      mean median     uq      max neval
 original(test.df) 86.725 91.6320 107.19399 92.513 94.462 1376.445   100
 row_sums(test.df) 26.698 28.0895  54.30694 29.741 32.443 2378.536   100

Answer 2

您可以避免在基数R中通过apply循环遍历列（即max.col）：

which.max(table(max.col(t(test.df))))

找到最常包含给定列中最大值的行

2 个答案: