r dplyr总结了多个因素计数

时间:2016-06-14 13:06:46

标签: r dplyr

我想要使用dplyr总结一个数据帧。在数据框中有多个因素,我想报告每个小组汇总的每个因子水平的计数。

有没有办法使用dplyr执行以下操作,而无需在汇总语句中命名每个因子级别。

库(dplyr)

set.seed(123)

s <- rbinom(100,1,0.5)
s <- factor(s,0:1,c('M','F'))
a <- sample(1:4,100,TRUE)
a <- factor(a,1:4,c('oldest','old','young','youngest'))
w <- rnorm(100,40,10)
g <- rep(1:2,each=50)

df <- data.frame(sex=s, age=a, weight=w, group=g)



sm <- df %>% group_by(group) %>% summarise(
  male = sum(ifelse(sex=='M',1,0))
  ,female = sum(ifelse(sex=='F',1,0))
  ,youngest = sum(ifelse(age=='youngest',1,0))
  ,young = sum(ifelse(age=='young',1,0))
  ,old = sum(ifelse(age=='old',1,0))
  ,oldest = sum(ifelse(age=='oldest',1,0))
  ,weight = mean(weight)
)

print(t(sm))

结果:

        [,1]     [,2]
group     1.000  2.00000
male     29.000 24.00000
female   21.000 26.00000
youngest 12.000  8.00000
young    13.000 17.00000
old      12.000 18.00000
oldest   13.000  7.00000
weight   37.461 40.38807

2 个答案:

答案 0 :(得分:3)

使用dplyr(虽然是迂回曲折的黑客方式!):

df %>%
    mutate(row_number1 = row_number(), row_number2 = row_number()) %>%
    spread(sex, row_number1) %>%
    spread(age, row_number2) %>%
    group_by(group) %>%
    mutate_each(funs(ifelse(is.na(.), 0, 1)), -weight) %>%
    mutate(count = 1) %>%
    summarize_each(funs(sum)) %>%
    mutate(weight = weight / (count)) %>%
    select(-count) %>%
    t()

结果:

           [,1]     [,2]
group     1.000  2.00000
weight   37.461 40.38807
M        25.000 28.00000
F        25.000 22.00000
oldest   13.000  7.00000
old      12.000 18.00000
young    13.000 17.00000
youngest 12.000  8.00000

答案 1 :(得分:2)

我假设你想要表格的因素,以及数字(例如weight),你想要的是平均值。

这不是使用dplyr,你可以做你想要的,虽然结果可能没有按你喜欢的方式格式化。

sapply(df, function(x) if (is.factor(x)) table(x, df$group) else tapply(x, df$group, mean))

您可能还想查看reporttools包,包括tableNominaltableContinuous