我已经寻找类似的线程,但无法找到解决方案。
我已按运营商对以下数据集进行了分组,并创建了新变量以成功查看平均延迟时间和总延迟时间。现在我只是想通过avg延迟来安排数据,但是当我将下面的代码放入其中时,每行返回相同的数据。任何人都可以帮我弄清楚我哪里出错了吗?
使用dplyr包,数据集是"航班",已使用以下方法过滤掉了na值:
filter(!is.na(dep_delay), !is.na(arr_delay)).
我从此资源http://r4ds.had.co.nz/transform.html#exercises-11
的第5.6.7节获得了数据和练习bycarrier %>%
transmute(
arrsum = sum(arr_delay),
arravg = mean(arr_delay),
depsum = sum(dep_delay),
depavg = mean(dep_delay)
) %>%
arrange(desc(arravg))
返回:
Adding missing grouping variables: `carrier`
Source: local data frame [327,346 x 5]
Groups: carrier [16]
carrier arrsum arravg depsum depavg
<chr> <dbl> <dbl> <dbl> <dbl>
1 F9 14928 21.9207 13757 20.20117
2 F9 14928 21.9207 13757 20.20117
3 F9 14928 21.9207 13757 20.20117
4 F9 14928 21.9207 13757 20.20117
5 F9 14928 21.9207 13757 20.20117
6 F9 14928 21.9207 13757 20.20117
7 F9 14928 21.9207 13757 20.20117
8 F9 14928 21.9207 13757 20.20117
9 F9 14928 21.9207 13757 20.20117
10 F9 14928 21.9207 13757 20.20117
# ... with 327,336 more rows
答案 0 :(得分:1)
我认为你需要使用函数summarise
而不是transmute
,如下所示:
bycarrier %>%
summarise(
arrsum = sum(arr_delay),
arravg = mean(arr_delay),
depsum = sum(dep_delay),
depavg = mean(dep_delay)
) %>%
arrange(desc(arravg))
这将给出输出:
# A tibble: 16 x 5
carrier arrsum arravg depsum depavg
<chr> <dbl> <dbl> <dbl> <dbl>
1 F9 14928 21.9207048 13757 20.201175
2 FL 63868 20.1159055 59074 18.605984
3 EV 807324 15.7964311 1013928 19.838929
4 YV 8463 15.5569853 10281 18.898897
5 OO 346 11.9310345 365 12.586207
6 MQ 269767 10.7747334 261521 10.445381
7 WN 116214 9.6491199 212717 17.661657
8 B6 511194 9.4579733 700883 12.967548
9 9E 127624 7.3796692 284306 16.439574
10 UA 205589 3.5580111 694361 12.016908
11 US 42232 2.1295951 74261 3.744693
12 VX 9027 1.7644644 65263 12.756646
13 DL 78366 1.6443409 439595 9.223950
14 AA 11638 0.3642909 273758 8.569130
15 HA -2365 -6.9152047 1676 4.900585
16 AS -7041 -9.9308886 4134 5.830748