Question

我有2个数据框

a = c(1,1,3)
b = c(7,2,1)
c = c(2,4,2)

d1 = cbind(a,b,c)

d = c(2,1,6)
e = c(1,4,2)
f = c(4,8,4)

d2 = cbind(d,e,f)

如何轻松获取每个点的最大值数据框

_fun(d1,d2)

     a b c
[1,] 2 7 4
[2,] 1 4 8
[3,] 6 2 4

我可以使用循环来实现它，但对于大数据帧来说它非常慢。

谢谢！

Answer 1

我们可以将数据集保存在list中，并将do.call与f一起用作pmax。

do.call(pmax, list(d1, d2))
#     a b c
#[1,] 2 7 4
#[2,] 1 4 8
#[3,] 6 2 4

或直接使用pmax

pmax(d1, d2)

编辑：基于@ nicola的评论。

使用pmax.int可能会更快，但转换回matrix可能会更慢。

matrix(pmax.int(d1, d2), dim(d1))

基准

set.seed(24)
m1 <- matrix(sample(0:9, 5000*5000, replace=TRUE), ncol=5000)
set.seed(48)
m2 <- matrix(sample(0:9, 5000*5000, replace=TRUE), ncol=5000)
akrun1 <- function() pmax(m1, m2)
akrun2 <- function() matrix(pmax.int(m1, m2), dim(m1))
colonel <- function() ifelse(m1 > m2, m1, m2)
system.time(akrun1())
#   user  system elapsed 
#  0.850   0.033   0.885 
system.time(akrun2())
#   user  system elapsed 
#  1.090   0.021   1.114 

system.time(colonel())
#   user  system elapsed 
#  5.049   0.336   5.395

Answer 2

或者只使用矢量化的ifelse：

ifelse(d1>d2, d1, d2)
#     a b c
#[1,] 2 7 4
#[2,] 1 4 8
#[3,] 6 2 4

或自建功能（仅测试速度）：

func = function(d1, d2) {m=d2;m[d1>d2]=d1[d1>d2];m}

和一些基准测试，最后自建功能似乎是最快的（但@ Akrun的解决方案足够快，对你的问题也应该没问题）：

#> d2 = matrix(sample(9000000), ncol=3000)
#> d1 = matrix(sample(9000000), ncol=3000)
#> system.time(ifelse(d1>d2, d1, d2))
#   user  system elapsed 
#   2.13    0.37    2.49 
#> system.time(matrix(pmax.int(d1, d2), dim(d1)))
#   user  system elapsed 
#   0.44    0.00    0.43 
#> system.time(pmax(d1, d2))
#   user  system elapsed 
#   0.41    0.02    0.42 
#> system.time(do.call(pmax, list(d1, d2)))
#   user  system elapsed 
#   0.34    0.01    0.36 
#> system.time(func(d1,d2))
#   user  system elapsed 
#   0.32    0.03    0.36

Answer 3

你也可以使用abind创建一个数组，然后像这样使用apply：

library(abind)

d3 <- abind(d1, d2, along = 3)
apply(d3, c(1, 2), max)

从矩阵中的每个点获取最大值

3 个答案:

基准