R_使用应用功能时提取正在使用的元素的行和列

时间:2018-10-05 14:09:57

标签: r apply

使用apply函数时如何提取所使用元素的行和列?例如,假设我要为矩阵的每个元素应用一个函数,其中所选元素的行号和列号也是该函数中的变量。下面给出了一个简单的可重现示例

mymatrix <- matrix(1:12, nrow=3, ncol=4)

我想要一个执行以下操作的功能

apply(mymatrix, c(1,2), function (x) sum(x, row_number, col_number))

其中row_numbercol_numbermymatrix中所选元素的行和列号。请注意,我的函数比sum更为复杂,因此请使用可靠的解决方案。

3 个答案:

答案 0 :(得分:3)

我不确定您要做什么,但是我会在这里使用for循环。

预先分配收益matrix,这将非常快

ret <- mymatrix
for (i in 1:nrow(mymatrix))
    for (j in 1:ncol(mymatrix))
        ret[i, j] <- sum(mymatrix[i, j], i, j)
#     [,1] [,2] [,3] [,4]
#[1,]    3    7   11   15
#[2,]    5    9   13   17
#[3,]    7   11   15   19

基准分析1

我很好奇,所以我进行了microbenchmark分析来比较方法;我使用了更大的200x300矩阵。

mymatrix <- matrix(1:600, nrow = 200, ncol = 300)
library(microbenchmark)
res <- microbenchmark(
    for_loop = {
        ret <- mymatrix
        for (i in 1:nrow(mymatrix))
            for (j in 1:ncol(mymatrix))
                ret[i, j] <- sum(mymatrix[i, j], i, j)
    },
    expand_grid_mapply = {
        newResult<- mymatrix
        grid1 <- expand.grid(1:nrow(mymatrix),1:ncol(mymatrix))
        newResult[]<-
        mapply(function(row_number, col_number){ sum(mymatrix[row_number, col_number], row_number, col_number) },row_number = grid1$Var1, col_number = grid1$Var2 )
    },
    expand_grid_apply = {
        newResult<- mymatrix
        grid1 <- expand.grid(1:nrow(mymatrix),1:ncol(mymatrix))
        newResult[]<-
        apply(grid1, 1, function(x){ sum(mymatrix[x[1], x[2]], x[1], x[2]) })
    },
    double_sapply = {
        sapply(1:ncol(mymatrix), function (x) sapply(1:nrow(mymatrix), function (y) sum(mymatrix[y,x],x,y)))
    }
)

res
#Unit: milliseconds
#               expr       min        lq      mean    median       uq       max
#           for_loop  41.42098  52.72281  56.86675  56.38992  59.1444  82.89455
# expand_grid_mapply 126.98982 161.79123 183.04251 182.80331 196.1476 332.94854
#  expand_grid_apply 295.73234 354.11661 375.39308 375.39932 391.6888 562.59317
#      double_sapply  91.80607 111.29787 120.66075 120.37219 126.0292 230.85411

library(ggplot2)
autoplot(res)

enter image description here

基准分析2(expand.grid之外的microbenchmark

grid1 <- expand.grid(1:nrow(mymatrix),1:ncol(mymatrix))
res <- microbenchmark(
    for_loop = {
        ret <- mymatrix
        for (i in 1:nrow(mymatrix))
            for (j in 1:ncol(mymatrix))
                ret[i, j] <- sum(mymatrix[i, j], i, j)
    },
    expand_grid_mapply = {
        newResult<- mymatrix
        newResult[]<-
        mapply(function(row_number, col_number){ sum(mymatrix[row_number, col_number], row_number, col_number) },row_number = grid1$Var1, col_number = grid1$Var2 )
    },
    expand_grid_apply = {
        newResult<- mymatrix
        newResult[]<-
        apply(grid1, 1, function(x){ sum(mymatrix[x[1], x[2]], x[1], x[2]) })
    }
)

res
#Unit: milliseconds
#               expr       min        lq      mean    median        uq      max
#           for_loop  39.65599  54.52077  60.87034  59.19354  66.64983  95.7890
# expand_grid_mapply 130.33573 167.68201 194.39764 186.82411 209.33490 400.9273
#  expand_grid_apply 296.51983 373.41923 405.19549 403.36825 427.41728 597.6937

答案 1 :(得分:1)

这不是套用的工作原理:您无法从[lsvm]?apply系列内部访问当前索引(行,列索引)。

在应用之前,您将必须创建当前行和列索引。 ?expand.grid

mymatrix <- matrix(1:12, nrow=3, ncol=4)
newResult<- mymatrix

grid1 <- expand.grid(1:nrow(mymatrix),1:ncol(mymatrix))

newResult[]<-
mapply(function(row_number, col_number){ sum(mymatrix[row_number, col_number], row_number, col_number) },row_number = grid1$Var1, col_number = grid1$Var2 )

newResult

#     [,1] [,2] [,3] [,4]
#[1,]    3    7   11   15
#[2,]    5    9   13   17
#[3,]    7   11   15   19

如果您想使用apply

newResult[]<-    
apply(grid1, 1, function(x){ sum(mymatrix[x[1], x[2]], x[1], x[2]) })

答案 2 :(得分:1)

这是我使用outer()函数的想法。

第三个参数FUN可以是任何两个参数的函数。

mymatrix <- matrix(1:12, nrow = 3, ncol = 4)
nr <- nrow(mymatrix)
nc <- ncol(mymatrix)
mymatrix + outer(1:nr, 1:nc, FUN = "+")

     [,1] [,2] [,3] [,4]
[1,]    3    7   11   15
[2,]    5    9   13   17
[3,]    7   11   15   19

使用@Maurits Evers的基准代码:

Unit: microseconds
     expr       min         lq      mean    median        uq        max
 for_loop 19963.203 22427.1630 25308.168 23811.855 25017.031 158341.678
    outer   848.247   949.3515  1054.944  1011.457  1059.217   1463.956

此外,我尝试用apply(X, c(1,2), function (x))完成您的原始想法:

(比其他答案要慢一点)

mymatrix <- matrix(1:12, nrow = 3, ncol = 4)
n <- 1                                        # n = index of data
nr <- nrow(mymatrix)
apply(mymatrix, c(1,2), function (x) {
  row_number <- (n-1) %% nr + 1               # convert n to row number
  col_number <- (n-1) %/% nr + 1              # convert n to column number
  res <- sum(x, row_number, col_number)
  n <<- n + 1
  return(res)
})

     [,1] [,2] [,3] [,4]
[1,]    3    7   11   15
[2,]    5    9   13   17
[3,]    7   11   15   19