假设我在R中有这样的数据框:
df <- data.frame(factor1 = c("A","B","B","C"),
factor2 = c("M","F","F","F"),
factor3 = c("0", "1","1","0"),
value = c(23,32,4,1))
我希望在dplyr
中获得一个由一个变量分组的摘要统计信息,如此(但更复杂):
df %>%
group_by(factor1) %>%
summarize(mean = mean(value))
现在我想为所有因子列做这个(想想100个因子变量)。在dplyr中有没有办法做到这一点?我还考虑在for
上进行names(df)
循环,但我将变量作为字符串,group_by()
不接受字符串。
答案 0 :(得分:5)
只需将您的数据放在一起。
library(tidyr)
df %>% gather(key = factor, value = level, -value) %>%
group_by(factor, level) %>%
summarize(mean = mean(value))
# factor level mean
# (chr) (chr) (dbl)
# 1 factor1 A 23.00000
# 2 factor1 B 18.00000
# 3 factor1 C 1.00000
# 4 factor2 F 12.33333
# 5 factor2 M 23.00000
# 6 factor3 0 12.00000
# 7 factor3 1 18.00000
要实际构建一个循环,Programming with dplyr小插图是正确的开始。