是否可以使用dplyr在单个group_by中将summary和summarise_at组合在一起

时间:2019-01-16 22:57:23

标签: r group-by dplyr data-manipulation

编辑:刚刚意识到数据中的side列根本没有使用,因此出于示例目的,请忽略它。

我拥有逐场比赛篮球数据的大数据框,我想对我的数据执行group_bysummarisesummarise_at。以下是我的数据框的子集:

> dput(zed)
structure(list(side = c("right", "right", "right", "right", "right", 
"right", "left", "right", "right", "right", "left", "right", 
"left", "left", "left", "right", "right", "right", "left", "right"
), result = c("twopointmiss", "twopointmade", "twopointmade", 
"twopointmiss", "twopointmade", "twopointmade", "twopointmiss", 
"twopointmade", "twopointmade", "twopointmade", "twopointmade", 
"twopointmade", "twopointmiss", "twopointmiss", "twopointmiss", 
"twopointmiss", "twopointmade", "twopointmade", "twopointmiss", 
"twopointmiss"), zonenumber = c(1, 1, 1, 1, 2, 3, 2, 3, 2, 3, 
4, 4, 4, 1, 1, 2, 3, 2, 3, 4), team = c("Bos", "Bos", "Bos", 
"Bos", "Bos", "Bos", "Bos", "Bos", "Bos", "Bos", "Min", "Min", 
"Min", "Min", "Min", "Min", "Min", "Min", "Min", "Min")), row.names = c(3L, 
5L, 8L, 14L, 17L, 23L, 28L, 30L, 39L, 41L, 42L, 43L, 47L, 52L, 
54L, 58L, 60L, 63L, 69L, 72L), class = "data.frame")

>   zed
    side       result zonenumber team
3  right twopointmiss          1  Bos
5  right twopointmade          1  Bos
8  right twopointmade          1  Bos
14 right twopointmiss          1  Bos
17 right twopointmade          2  Bos
23 right twopointmade          3  Bos
28  left twopointmiss          2  Bos
30 right twopointmade          3  Bos
39 right twopointmade          2  Bos
41 right twopointmade          3  Bos
42  left twopointmade          4  Min
43 right twopointmade          4  Min
47  left twopointmiss          4  Min
52  left twopointmiss          1  Min
54  left twopointmiss          1  Min
58 right twopointmiss          2  Min
60 right twopointmade          3  Min
63 right twopointmade          2  Min
69  left twopointmiss          3  Min
72 right twopointmiss          4  Min

在下面的示例中,我使用summarise,因为我目前不确定如何使用summarise {{1 }}与相同的summarise_at调用:

group_by

在下面的示例中,我想在> grouped.df <- zed %>% + dplyr::group_by(team) %>% + dplyr::summarise( + shotsMade = sum(result == "twopointmade"), + shotsAtt = n(), + shotsPct = round(shotsMade / shotsAtt), + points = 2 * shotsMade, + + z1Made = sum(zonenumber == 1), + z2Made = sum(zonenumber == 2), + z3Made = sum(zonenumber == 3), + z4Made = sum(zonenumber == 4) + ) > grouped.df # A tibble: 2 x 9 team shotsMade shotsAtt shotsPct points z1Made z2Made z3Made z4Made <chr> <int> <int> <dbl> <dbl> <int> <int> <int> <int> 1 Bos 7 10 1 14 4 3 3 0 2 Min 4 10 0 8 2 2 2 4 中创建前4列(shotsMade,shotsAtt,shotsPct,点),并使用summarise_at创建summarise列。在我的全部数据中,我计划使用z#made创建约30个独特的列,而我计划使用summarise创建约80相似的列。

为了一个小例子,我不想将整个数据框都带入这个例子。如果我能够在上面的示例中同时实现summarise_atsummarise,那么我也可以在整个数据帧中实现它。

任何对此的想法都会受到赞赏,因为我特别热衷于通过dplyr中的summarise_at函数进行改进。谢谢!

1 个答案:

答案 0 :(得分:2)

我认为没有一种可以同时使用summarisesummarise_at的方法,因为很明显,在丢失许多行和列之后,我们将无法执行第二个。

因此,相反,我们可以使用mutatemutate_at,然后删除某些行(也许是列)。这和神奇地应用summarise和{{1} }将是前一种方法不会删除任何变量。我想这取决于对您是否有益。在下面,我添加了一行额外的summarise_at行,该行实际上将删除摘要组合将要删除的所有列。

select(-one_of(setdiff(names(zed), "team")))