汇总分组的数据框,同时保留作为因子向量的所有列

时间:2019-05-07 17:35:22

标签: r dplyr

我有一个很大的数据框,其中包含一段时间内几个人的性能数据。而不是让每个人都有各自的表现,我想要一个包含每个人的总数/平均值的数据框。这是一个示例数据框:

name<-c("dwayne", "alf", "christine", "katerina", "dwayne", "christine")
team<- c("halifax", "hamilton", "calgary", "winnipeg", "halifax", "calgary")
pos<- c("left", "middle", "middle", "right", "left", "middle")
amt1<- c(4, 2, 5, 8, 5, 7)
amt2 <- c(12, 14, 13, 18, 17, 18)
perc1<- c(.55, .24, .67, .45, .34, .54)
perc2<- c(.12, .14, .16, .04, .02, .13)

df<-data_frame(team, pos, name, amt1, amt2, perc1, perc2)

到目前为止,我已经弄清楚如何使用group_by和summary_if通过数字列来完成此操作,就像这样:

tot<-df %>%
  group_by(name) %>%
  summarise_at(vars(amt1:amt2), sum)

av <- df %>%
  group_by(name) %>%
  summarise_at(vars(perc1:perc2), mean)

bnd<-cbind(tot, av)

bnd <- bnd[, !duplicated(colnames(bnd))]

但是,我的问题是:此方法返回一个不包含“ pos”或“ team”列的数据框。这些是分析此数据时的关键信息,但不是数字信息,因此在使用摘要时将其删除

函数如何在仍然存在那些因子向量的情况下返回数据帧“ bnd”?

2 个答案:

答案 0 :(得分:0)

只要团队,pos和名称的组合唯一,就可以将这些变量包括在分组中

tot <- df %>%
  group_by(team, pos, name) %>%
  summarise_at(vars(amt1:amt2), sum) %>%
  ungroup()

# A tibble: 4 x 5
  team     pos    name       amt1  amt2
  <chr>    <chr>  <chr>     <dbl> <dbl>
1 calgary  middle christine    12    31
2 halifax  left   dwayne        9    29
3 hamilton middle alf           2    14
4 winnipeg right  katerina      8    18

答案 1 :(得分:0)

如果您不需要分别总结每个团队或职位的球员成绩,那么处理多个团队/职位的另一种选择是保留所有球员/职位。对于每个name,将team的唯一值组合为单个字符串,对于pos同样。例如:

library(tidyverse)

# Added a couple of additional rows for illustration
df = data.frame(name=c("dwayne", "alf", "christine", "katerina", "dwayne", "christine", "christine", "dwayne"),
                team= c("halifax", "hamilton", "calgary", "winnipeg", "halifax", "calgary", "halifax","halifax"),
                pos= c("left", "middle", "middle", "right", "left", "middle", "middle","middle"),
                amt1= c(4, 2, 5, 8, 5, 7,5,5),
                amt2 = c(12, 14, 13, 18, 17, 18,17,13),
                perc1= c(.55, .24, .67, .45, .34, .54,.56,.51),
                perc2= c(.12, .14, .16, .04, .02, .13, .11, .09))

df %>% 
  group_by(name) %>% 
  mutate(team = paste(unique(team), collapse="-"),
         pos = paste(unique(pos), collapse="-")) %>% 
  group_by(name, team, pos) %>% 
  summarise_at(vars(amt1:amt2), sum)
  name      team            pos          amt1  amt2
1 alf       hamilton        middle          2    14
2 christine calgary-halifax middle         17    48
3 dwayne    halifax         left-middle    14    42
4 katerina  winnipeg        right           8    18