循环并将相同的dplyr函数应用于许多列

时间:2016-03-28 22:11:23

标签: r dplyr

假设我在R中有这样的数据框:

df <- data.frame(factor1 = c("A","B","B","C"),
                factor2 = c("M","F","F","F"),
                factor3 = c("0", "1","1","0"),
                value = c(23,32,4,1))

我希望在dplyr中获得一个由一个变量分组的摘要统计信息,如此(但更复杂):

df %>% 
    group_by(factor1) %>% 
    summarize(mean = mean(value)) 

现在我想为所有因子列做这个(想想100个因子变量)。在dplyr中有没有办法做到这一点?我还考虑在for上进行names(df)循环,但我将变量作为字符串,group_by()不接受字符串。

1 个答案:

答案 0 :(得分:5)

只需将您的数据放在一起。

library(tidyr)
df %>% gather(key = factor, value = level, -value) %>%
    group_by(factor, level) %>%
    summarize(mean = mean(value))

#    factor level     mean
#     (chr) (chr)    (dbl)
# 1 factor1     A 23.00000
# 2 factor1     B 18.00000
# 3 factor1     C  1.00000
# 4 factor2     F 12.33333
# 5 factor2     M 23.00000
# 6 factor3     0 12.00000
# 7 factor3     1 18.00000

要实际构建一个循环,Programming with dplyr小插图是正确的开始。