有条件地聚合不包含某些值的数据框行

时间:2019-11-25 14:20:57

标签: r dplyr

在df中,我只想保留intersect_streetstreets中包含的街道名称匹配的那些行,同时还要将已删除行的intersection_distance_meters添加到其上方的行中

df

> streets
[1] "FRONT ST" "2ND ST"   "3RD ST"   "4TH ST"  

> df
              intersection segment_key intersection_distance_meters intersect_street
1       ARCH ST & FRONT ST         1EW                           81         FRONT ST
2     ARCH ST & MASCHER ST         2EW                           60       MASCHER ST
3         ARCH ST & 2ND ST         3EW                           57           2ND ST
4 ARCH ST & LITTLE BOYS CT         4EW                           28   LITTLE BOYS CT
5       ARCH ST & BREAD ST         5EW                           83         BREAD ST
6         ARCH ST & 3RD ST         6EW                          135           3RD ST
7         ARCH ST & 4TH ST         7EW                          144           4TH ST

所需的输出

              intersection segment_key intersection_distance_meters intersect_street
1       ARCH ST & FRONT ST         1EW                          141         FRONT ST
2         ARCH ST & 2ND ST         3EW                          168           2ND ST
3         ARCH ST & 3RD ST         6EW                          135           3RD ST
4         ARCH ST & 4TH ST         7EW                          144           4TH ST

我一直在使用dplyr中的lead()将下一行的intersect_streetintersection_distance_meters添加为新列,然后有条件地对其进行求和,但是在那里我遇到了问题是一行中的多个非主要交叉点(例如,上面的第4和第5行)。

数据

df <- structure(list(intersection = c("ARCH ST & FRONT ST", "ARCH ST & MASCHER ST", 
"ARCH ST & 2ND ST", "ARCH ST & LITTLE BOYS CT", "ARCH ST & BREAD ST", 
"ARCH ST & 3RD ST", "ARCH ST & 4TH ST"), segment_key = c("1EW", 
"2EW", "3EW", "4EW", "5EW", "6EW", "7EW"), intersection_distance_meters = c(81, 
60, 57, 28, 83, 135, 144), intersect_street = c("FRONT ST", "MASCHER ST", 
"2ND ST", "LITTLE BOYS CT", "BREAD ST", "3RD ST", "4TH ST")), row.names = c(NA, 
7L), class = "data.frame")

streets <- c("FRONT ST", "2ND ST", "3RD ST", "4TH ST")

1 个答案:

答案 0 :(得分:1)

我想这就是你想要的。我创建了一些额外的帮助器列---我把它们留在了里面,所以逻辑很清楚。

df %>% mutate(
    keep = intersect_street %in% streets,
    grouper = cumsum(keep)
  ) %>%
  group_by(grouper) %>%
  mutate(total_intersection_dist = sum(intersection_distance_meters)) %>%
  slice(1)
# # A tibble: 4 x 7
# # Groups:   grouper [4]
#   intersection       segment_key intersection_distance_met~ intersect_street keep  grouper total_intersection_di~
#   <chr>              <chr>                            <dbl> <chr>            <lgl>   <int>                  <dbl>
# 1 ARCH ST & FRONT ST 1EW                                 81 FRONT ST         TRUE        1                    141
# 2 ARCH ST & 2ND ST   3EW                                 57 2ND ST           TRUE        2                    168
# 3 ARCH ST & 3RD ST   6EW                                135 3RD ST           TRUE        3                    135
# 4 ARCH ST & 4TH ST   7EW                                144 4TH ST           TRUE        4                    144