Question

数据：假设我有一个名为S的矩阵列表，可以通过以下方式生成：

S<-list(c(1:25),c(1:25),c(1:25),c(1:25))

这是一种我想要优化的可行方法：

for (i in 1:length(S))
{
  dim(S[[i]])<-c(5,5)
}

在网上搜索后，我尝试使用lapply在列表中应用函数，这是我尝试过的代码：

mat<-lapply(S, function(x) dim(x)<-c(5,5))

仅返回：

> mat
[[1]]
[1] 5 5

[[2]]
[1] 5 5

[[3]]
[1] 5 5

[[4]]
[1] 5 5

问题：我想知道是否有内置函数可以在不需要return的列表中应用函数，或者我的代码中是否存在某些错误？

提前致谢。

Answer 1

扩展代码尝试，您需要包含显式或隐式return语句：

lapply(S, function(x) { dim(x) <- c(5, 5); return(x) })
lapply(S, function(x) { dim(x) <- c(5, 5); x; })

或更快，将每个list条目重新定义为matrix：

lapply(S, function(x) matrix(x, 5, 5))

或使用purrr::map：

map(S, ~ matrix(., 5, 5))

基准比较

[@HunterJiang编辑]

library(microbenchmark)
library(purrr)
library(ggplot2)
N<-30
M<-30
S<-list(c(1:(N*M)),c(1:(N*M)),c(1:(N*M)),c(1:(N*M)))
mb <- microbenchmark(
  for_loop = { for (i in 1:length(S)) dim(S[[i]])<-c(N,M) },
  dim_plus_return = { S1<-lapply(S, function(x) { dim(x) <- c(N,M); return(x) }) },
  cast_matrix = { S1<-lapply(S, function(x) matrix(x, N,M)) },
  purrr_map = { S1<-map(S, ~ matrix(.,N,M)) },
  set_dim_directly = { S1<-lapply(S, `dim<-`, c(N,M)) }
)
mb
ggplot(mb, aes(expr, log10(time))) + 
  geom_boxplot() + 
  labs(y = "Time in log10 nanosec", x = "Method")

当N和M很小时，说N = M = 30，方法的速度是：

Unit: microseconds
             expr      min       lq       mean    median        uq      max neval
         for_loop 2111.950 2236.298 2537.42270 2328.4735 2484.2055 4581.549   100
  dim_plus_return   10.264   12.633   32.91945   16.1855   19.3440 1641.794   100
      cast_matrix   11.054   13.423   27.40873   16.3830   18.9490 1068.213   100
        purrr_map   70.662   77.768   99.41636   93.1640  112.9015  199.748   100
 set_dim_directly    5.527    6.909    8.47230    7.8960    9.6720   22.502   100

但是当N和M变大时，说N = M = 3k，lapply变得比以前慢，for循环可能是一种正确的方法。

Unit: milliseconds
             expr       min       lq     mean   median        uq      max neval
         for_loop  2.224456 20.83191 52.76189 41.72521  69.91993 180.9775   100
  dim_plus_return 35.930768 37.57671 68.63905 39.31620  74.14185 193.8300   100
      cast_matrix 48.220338 51.16917 79.73308 52.37871  87.31804 199.2859   100
        purrr_map 49.534089 51.21635 89.11881 61.12987 101.98780 195.1374   100
 set_dim_directly 35.151124 37.71112 67.72032 39.91919  74.97617 184.4943   100

结论： S1<-lapply(S, `dim<-`, c(N,M))适合小数据集，当数据集的维度非常大时，for循环可能会更快。

重置列表中所有矩阵的维度

1 个答案:

基准比较