我最后遇到了一个奇怪的例子,我在短(~20行),宽(~20K列)data.frames上反复运行select_以获取大约50列,并注意到dplyr :: select_运行速度非常慢。
df <- cbind(1:20, matrix(rnorm(20000 * 20), nrow = 20, ncol = 20000)) %>%
data.frame
names(df) <- c("id", paste0("v", 1:20000))
selectcols <- c("id", paste0("v", sample(1:20000, 50)))
system.time(dfshort1 <- df[,selectcols])
# user system elapsed
# 0.004 0.000 0.000
system.time(dfshort2 <- df %>%
select_(.dots = paste0("~", selectcols) %>%
lapply(as.formula)))
# user system elapsed
# 69.752 0.008 69.773
identical(data.frame(dfshort1), data.frame(dfshort2))
# TRUE
知道为什么会这样吗?