如何对R中两个不同表的属性进行算术运算?

时间:2018-05-15 14:45:02

标签: r dplyr

如果这是一个简单而冗余的问题,我道歉,但我似乎无法找到任何与我在谷歌搜索几个小时后所寻找的东西相匹配的东西。我是R的新手。

我的目标是找出三角洲航空公司航班延迟到达的百分比,具体取决于他们离开哪个机场。到目前为止,这是我的代码:

#install.packages("nycflights13")
#library(nycflights13)
flts <- nycflights13::flights

# filtering by Delta Airlines and late arrival dates
all_delta_flights <- filter(flts, carrier == "DL")
all_late_delta_flights <- filter(flts, carrier == "DL", arr_delay > 0)

# group by departing airport
by_origin <- all_delta_flights %>% group_by(origin)
by_origin_late <- all_late_delta_flights %>% group_by(origin) 

# get number of flights by departure airport
by_origin_late %>% summarise(n = n())
by_origin %>% summarise(n = n())

最后两行代码输出以下两个表。

# A tibble: 3 x 2
  <chr>  <int>
1 EWR     1725
2 JFK     6353
3 LGA     8335

# A tibble: 3 x 2
  origin     n
  <chr>  <int>
1 EWR     4342
2 JFK    20701
3 LGA    23067

我现在要做的是创建一个组合n列的新表,例如

# A tibble: 3 x 2
  origin     n
  <chr>  <double>
1 EWR     .397     #  == 1725 / 4342
2 JFK     ???      #  == 6353 / 20701
3 LGA     ???

在R中有一种简单的方法吗?

谢谢!

1 个答案:

答案 0 :(得分:4)

您可以在单个管道中执行此操作而无需加入:

flts %>% 
    filter(carrier == "DL") %>% 
    group_by(origin) %>% 
    summarize(percent = sum(arr_delay > 0) / n())

似乎arr_delay列包含NA值,您可能需要在na.rm=T中添加sum

flts %>% 
    filter(carrier == "DL") %>% 
    group_by(origin) %>% 
    summarize(percent = sum(arr_delay > 0, na.rm=T) / n())

# A tibble: 3 x 2
#  origin percent
#  <chr>    <dbl>
#1 EWR      0.397
#2 JFK      0.307
#3 LGA      0.361