dplyr :: select_在短而宽的数据框架上非常慢

时间:2017-07-31 16:44:15

标签: r performance tidyr

我最后遇到了一个奇怪的例子,我在短(~20行),宽(~20K列)data.frames上反复运行select_以获取大约50列,并注意到dplyr :: select_运行速度非常慢。

df <- cbind(1:20, matrix(rnorm(20000 * 20), nrow = 20, ncol = 20000)) %>%
    data.frame
names(df) <- c("id", paste0("v", 1:20000))
selectcols <- c("id", paste0("v", sample(1:20000, 50)))

system.time(dfshort1 <- df[,selectcols])
#    user  system elapsed 
#   0.004   0.000   0.000 

system.time(dfshort2 <- df %>%
                select_(.dots = paste0("~", selectcols) %>%
                            lapply(as.formula)))
#    user  system elapsed 
#  69.752   0.008  69.773

identical(data.frame(dfshort1), data.frame(dfshort2))
#    TRUE

知道为什么会这样吗?

0 个答案:

没有答案