如何将具有多个参数的函数的观测值绑定到行中?

时间:2018-02-04 15:16:07

标签: r function lapply rbind

我想做一个模拟来证明一些事情。因此,我创建了一个生成一个观察的函数,与下面的内容非常类似。

set.seed(458007)

fun1 <- function(M, x, y){
  M <- matrix(NA, nrow = 1, ncol = 4)
  M[, 1] <- x
  M[, 2] <- y
  M[, 3] <- mean(rnorm(x, 0, y))
  M[, 4] <- mean(rnorm(x, 1, y))
  return(M)
}

fun1(x=1e3, y=.5)
#      [,1] [,2]        [,3]      [,4]
# [1,] 1000  0.5 0.001414806 0.9875602

对于现在的模拟,我想将带有不同参数的重复观察绑定到行中,并选择lapply()方法。虽然我带来了以下代码,但具有讽刺意味的是它是最慢的代码。请注意,在related question中,我略微过分了我的问题的例子,这实际上有点嵌套。也许有人可以帮我找到更快的解决方案吗?

#  lapply
do.call(rbind, lapply(c(.5, 1, 1.5), function(y)
  do.call(rbind, lapply(c(1e3, 1e4, 1e5), 
         function(x) do.call(rbind, lapply(1:5, fun1, x, y))))))
#        [,1] [,2]          [,3]      [,4]
#  [1,] 1e+03  0.5  0.0156969547 0.9878933
#  [2,] 1e+03  0.5  0.0187202908 1.0011313
#  [3,] 1e+03  0.5 -0.0351017539 0.9953701
#  [4,] 1e+03  0.5 -0.0129749736 1.0112514
#  [5,] 1e+03  0.5 -0.0154776052 0.9793552
#  [6,] 1e+04  0.5 -0.0070121049 1.0022838
#  [7,] 1e+04  0.5 -0.0064961931 0.9967966
#  [8,] 1e+04  0.5 -0.0054208002 0.9955582
#  [9,] 1e+04  0.5 -0.0027074479 1.0019217
# [10,] 1e+04  0.5  0.0047017946 1.0069838
# [11,] 1e+05  0.5 -0.0018550320 0.9981459
# [12,] 1e+05  0.5 -0.0019201731 0.9973762
# [13,] 1e+05  0.5 -0.0031555017 1.0016808
# [14,] 1e+05  0.5 -0.0005508661 1.0001200
# [15,] 1e+05  0.5  0.0002928878 0.9991147
# [16,] 1e+03  1.0  0.0043441072 0.9579204
# [17,] 1e+03  1.0 -0.0059409534 1.0068553
# [18,] 1e+03  1.0  0.0850053171 1.0316056
# [19,] 1e+03  1.0 -0.0145192268 1.0193467
# [20,] 1e+03  1.0  0.0104437603 0.9959815
# [21,] 1e+04  1.0  0.0252303898 0.9968866
# [22,] 1e+04  1.0  0.0039449755 0.9818866
# [23,] 1e+04  1.0  0.0145974970 0.9814802
# [24,] 1e+04  1.0 -0.0016105680 0.9968357
# [25,] 1e+04  1.0  0.0058877101 1.0049794
# [26,] 1e+05  1.0  0.0015416062 1.0008094
# [27,] 1e+05  1.0  0.0004725605 1.0001917
# [28,] 1e+05  1.0 -0.0007963141 1.0019771
# [29,] 1e+05  1.0 -0.0007302225 0.9969158
# [30,] 1e+05  1.0  0.0023877190 1.0060436
# [31,] 1e+03  1.5  0.0165765473 0.9391917
# [32,] 1e+03  1.5 -0.0990828503 1.0256720
# [33,] 1e+03  1.5  0.0526152728 0.9981981
# [34,] 1e+03  1.5  0.1472273215 0.9442844
# [35,] 1e+03  1.5  0.0346540383 1.0316669
# [36,] 1e+04  1.5 -0.0007479431 0.9800219
# [37,] 1e+04  1.5  0.0189053160 1.0284075
# [38,] 1e+04  1.5  0.0062155928 0.9821324
# [39,] 1e+04  1.5 -0.0065533501 1.0085699
# [40,] 1e+04  1.5 -0.0161694486 1.0126392
# [41,] 1e+05  1.5 -0.0090145992 0.9952551
# [42,] 1e+05  1.5 -0.0024756213 1.0054282
# [43,] 1e+05  1.5  0.0061985946 0.9966108
# [44,] 1e+05  1.5  0.0023640342 0.9988624
# [45,] 1e+05  1.5  0.0014610948 0.9956877

@ lefft和@ Parfait解决方案的基准(来自示例)

# Unit: milliseconds
#       expr      min       lq     mean   median       uq      max neval
#       mine 325.8589 347.1616 405.6944 398.6682 434.9392 906.7906   100
#      lefft 327.6870 348.3504 393.7511 393.2127 421.4536 694.1610   100
#    Parfait 323.5595 343.5806 396.9973 390.9864 423.0759 736.2887   100

1 个答案:

答案 0 :(得分:1)

这是一个很好的紧凑方式:

# create the input data *before* applying the func to it 
dat <- expand.grid(Ms=1:5, xs=c(1e3, 1e4, 1e5), ys=c(.5, 1, 1.5))

# apply the function to each row of the data frame (`MARGIN=1` is row-wise) 
# (the outermost function `t()` transposes the result so it looks like yours)
t(apply(dat, MARGIN=1, FUN=function(row) fun1(M=row[1], x=row[2], y=row[3])))

它比原始解决方案快10%左右,但差异可能与更大的数据不同。需要注意的一点是,如果你首先创建param网格(就像这里所做的那样),那么用fun1()计算所花费的时间可以更容易地被隔离(即你可以知道需要花费很长时间的东西 - 计算或创建输入数据框)。

希望这有帮助!