我试图让航空公司在夏季获得最大航班价值
max_flights_all_c<-nycflights13::flights %>%
group_by(carrier,month)%>%
filter(month==6 | month==7 | month==8 | month==9)%>%
summarise(n=n())
现在我得到了;
carrier month n
9E 7 1494
9E 8 1456
9E 9 1540
AA 6 2757
AA 7 2882
AA 8 2856
AA 9 2614
AS 6 60
AS 7 62
AS 8 62
AS 9 60
B6 6 4622
B6 7 4984
但希望每个月只获得n的最大值。
答案 0 :(得分:4)
在summarise
步骤之后,我们按“月份”分组。并获得max
行&#39; n&#39;与slice
。
max_flights_all_c <- nycflights13::flights %>%
group_by(carrier,month)%>%
filter(month %in% 6:9) %>%
summarise(n = n()) %>%
group_by(month) %>%
slice(which.max(n))
答案 1 :(得分:2)
感谢@Henk获取更新的data.table
解决方案:
setDT(nycflights13::flights)[month %between% c(6,9), .N, keyby = .(carrier, month)][, .SD[which.max(N)], month]
month carrier n
1: 6 UA 4975
2: 7 UA 5066
3: 8 UA 5124
4: 9 EV 4725
原始解决方案在答案的修订历史中。
Microbencmark:(对于任何关心的人)
library(microbenchmark)
microbenchmark(henk=setDT(nycflights13::flights)[month %between% c(6,9), .N, keyby = .(carrier, month)][, .SD[which.max(N)], month],
akrun=nycflights13::flights %>%
group_by(carrier,month)%>%
filter(month %in% 6:9) %>%
summarise(n = n()) %>%
group_by(month) %>%
slice(which.max(n)))
Unit: milliseconds
expr min lq mean median uq max neval
henk 5.612305 6.41659 7.416813 6.953205 7.515347 49.38172 100
akrun 45.529320 47.51715 51.943065 48.882663 49.834458 221.39357 100