我想要使用dplyr总结一个数据帧。在数据框中有多个因素,我想报告每个小组汇总的每个因子水平的计数。
有没有办法使用dplyr执行以下操作,而无需在汇总语句中命名每个因子级别。
库(dplyr)
set.seed(123)
s <- rbinom(100,1,0.5)
s <- factor(s,0:1,c('M','F'))
a <- sample(1:4,100,TRUE)
a <- factor(a,1:4,c('oldest','old','young','youngest'))
w <- rnorm(100,40,10)
g <- rep(1:2,each=50)
df <- data.frame(sex=s, age=a, weight=w, group=g)
sm <- df %>% group_by(group) %>% summarise(
male = sum(ifelse(sex=='M',1,0))
,female = sum(ifelse(sex=='F',1,0))
,youngest = sum(ifelse(age=='youngest',1,0))
,young = sum(ifelse(age=='young',1,0))
,old = sum(ifelse(age=='old',1,0))
,oldest = sum(ifelse(age=='oldest',1,0))
,weight = mean(weight)
)
print(t(sm))
结果:
[,1] [,2]
group 1.000 2.00000
male 29.000 24.00000
female 21.000 26.00000
youngest 12.000 8.00000
young 13.000 17.00000
old 12.000 18.00000
oldest 13.000 7.00000
weight 37.461 40.38807
答案 0 :(得分:3)
使用dplyr(虽然是迂回曲折的黑客方式!):
df %>%
mutate(row_number1 = row_number(), row_number2 = row_number()) %>%
spread(sex, row_number1) %>%
spread(age, row_number2) %>%
group_by(group) %>%
mutate_each(funs(ifelse(is.na(.), 0, 1)), -weight) %>%
mutate(count = 1) %>%
summarize_each(funs(sum)) %>%
mutate(weight = weight / (count)) %>%
select(-count) %>%
t()
结果:
[,1] [,2]
group 1.000 2.00000
weight 37.461 40.38807
M 25.000 28.00000
F 25.000 22.00000
oldest 13.000 7.00000
old 12.000 18.00000
young 13.000 17.00000
youngest 12.000 8.00000
答案 1 :(得分:2)
我假设你想要表格的因素,以及数字(例如weight
),你想要的是平均值。
这不是使用dplyr,你可以做你想要的,虽然结果可能没有按你喜欢的方式格式化。
sapply(df, function(x) if (is.factor(x)) table(x, df$group) else tapply(x, df$group, mean))
您可能还想查看reporttools
包,包括tableNominal
和tableContinuous
。