使用R中的dplyr查找每组最大值

时间:2016-10-20 14:44:14

标签: r dplyr

我试图让航空公司在夏季获得最大航班价值

max_flights_all_c<-nycflights13::flights %>%
   group_by(carrier,month)%>%
   filter(month==6 | month==7 | month==8 | month==9)%>%
    summarise(n=n()) 

现在我得到了;

carrier month   n
9E  7   1494
9E  8   1456
9E  9   1540
AA  6   2757
AA  7   2882
AA  8   2856
AA  9   2614
AS  6   60
AS  7   62
AS  8   62
AS  9   60
B6  6   4622
B6  7   4984

但希望每个月只获得n的最大值。

2 个答案:

答案 0 :(得分:4)

summarise步骤之后,我们按“月份”分组。并获得max行&#39; n&#39;与slice

max_flights_all_c <- nycflights13::flights %>%
                          group_by(carrier,month)%>%
                          filter(month %in% 6:9) %>%
                          summarise(n = n()) %>%
                          group_by(month) %>%
                          slice(which.max(n))

答案 1 :(得分:2)

感谢@Henk获取更新的data.table解决方案:

setDT(nycflights13::flights)[month %between% c(6,9), .N, keyby = .(carrier, month)][, .SD[which.max(N)], month]

   month carrier    n
1:     6      UA 4975
2:     7      UA 5066
3:     8      UA 5124
4:     9      EV 4725

原始解决方案在答案的修订历史中。

Microbencmark:(对于任何关心的人)

library(microbenchmark)
microbenchmark(henk=setDT(nycflights13::flights)[month %between% c(6,9), .N, keyby = .(carrier, month)][, .SD[which.max(N)], month],
               akrun=nycflights13::flights %>%
                 group_by(carrier,month)%>%
                 filter(month %in% 6:9) %>%
                 summarise(n = n()) %>%
                 group_by(month) %>%
                 slice(which.max(n)))

Unit: milliseconds
  expr       min       lq      mean    median        uq       max neval
  henk  5.612305  6.41659  7.416813  6.953205  7.515347  49.38172   100
 akrun 45.529320 47.51715 51.943065 48.882663 49.834458 221.39357   100