如果这是一个简单而冗余的问题,我道歉,但我似乎无法找到任何与我在谷歌搜索几个小时后所寻找的东西相匹配的东西。我是R的新手。
我的目标是找出三角洲航空公司航班延迟到达的百分比,具体取决于他们离开哪个机场。到目前为止,这是我的代码:
#install.packages("nycflights13")
#library(nycflights13)
flts <- nycflights13::flights
# filtering by Delta Airlines and late arrival dates
all_delta_flights <- filter(flts, carrier == "DL")
all_late_delta_flights <- filter(flts, carrier == "DL", arr_delay > 0)
# group by departing airport
by_origin <- all_delta_flights %>% group_by(origin)
by_origin_late <- all_late_delta_flights %>% group_by(origin)
# get number of flights by departure airport
by_origin_late %>% summarise(n = n())
by_origin %>% summarise(n = n())
最后两行代码输出以下两个表。
# A tibble: 3 x 2
<chr> <int>
1 EWR 1725
2 JFK 6353
3 LGA 8335
# A tibble: 3 x 2
origin n
<chr> <int>
1 EWR 4342
2 JFK 20701
3 LGA 23067
我现在要做的是创建一个组合n列的新表,例如
# A tibble: 3 x 2
origin n
<chr> <double>
1 EWR .397 # == 1725 / 4342
2 JFK ??? # == 6353 / 20701
3 LGA ???
在R中有一种简单的方法吗?
谢谢!
答案 0 :(得分:4)
您可以在单个管道中执行此操作而无需加入:
flts %>%
filter(carrier == "DL") %>%
group_by(origin) %>%
summarize(percent = sum(arr_delay > 0) / n())
似乎arr_delay
列包含NA值,您可能需要在na.rm=T
中添加sum
:
flts %>%
filter(carrier == "DL") %>%
group_by(origin) %>%
summarize(percent = sum(arr_delay > 0, na.rm=T) / n())
# A tibble: 3 x 2
# origin percent
# <chr> <dbl>
#1 EWR 0.397
#2 JFK 0.307
#3 LGA 0.361