Question

更快地执行以下R代码的任何通用方法？例如，在python dict中，理解（见下面的等价物）将是一个更好的更快的替代方案。

R：

l1 <- 1:3
l2 <- c("a", "b", "c")
foo <- function(x) {return(5*x)}
bar <- list()
for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])

的Python

l1 = range(1, 4)
l2 = ["a", "b", "c"]
def foo(x):
    return 5*x
{b: foo(a) for a, b in zip(l1, l2)}

Answer 1

我们正在谈论速度，所以让我们做一些基准测试：

library(microbenchmark)
microbenchmark(op = {for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])},
               lapply = setNames(lapply(l1,foo),l2),
               vectorised = setNames(as.list(foo(l1)), l2))

Unit: microseconds
       expr   min    lq     mean median     uq    max neval
         op 7.982 9.122 10.81052  9.693 10.548 36.206   100
     lapply 5.987 6.557  7.73159  6.842  7.270 55.877   100
 vectorised 4.561 5.132  6.72526  5.417  5.987 80.964   100

但是这些小的价值并不重要，所以我将向量长度提高到10,000，你真的会看到差异：

l <- 10000
l1 <- seq_len(l)
l2 <- sample(letters, l, replace = TRUE)

microbenchmark(op = {bar <- list(); for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])},
               lapply = setNames(lapply(l1,foo),l2),
               vectorised = setNames(as.list(foo(l1)), l2),
               times = 100)

Unit: microseconds
       expr       min        lq       mean     median        uq       max neval
         op 30122.865 33325.788 34914.8339 34769.8825 36721.428 41515.405   100
     lapply 13526.397 14446.078 15217.5309 14829.2320 15351.933 19241.767   100
 vectorised   199.559   259.997   349.0544   296.9155   368.614  3189.523   100

但要坚持其他人所说的话，它不一定要成为一个清单。如果删除列表要求：

microbenchmark(setNames(foo(l1), l2))

Unit: microseconds
                  expr    min      lq     mean  median     uq      max neval
 setNames(foo(l1), l2) 22.522 23.8045 58.06888 25.0875 48.322 1427.417   100

R中的字典和列表理解

1 个答案: