根据匹配条件删除某些行

时间:2018-10-12 18:55:24

标签: r dataframe subset

我有这个数据集:

data <- data.frame(trip_id = c("456B", "123A", "123A", "456B", "456B", "123A", "789C", "789C"),
                   comment = c("void", "", "", "", "", "void", "", "void"),
                   paid = c(0, 100, 100, 250, 250, 0, 125, 0))

print(data)
#trip_id comment paid 
#   456B    void    0
#   123A          100
#   123A          100
#   456B          250
#   456B          250
#   123A    void    0
#   789C          125
#   789C    void    0

我希望能够以编程方式删除comment字段中带有“ void”的行,以及每个trip_id中的行,其中每个comment的{ {1}}字段。使用示例,输出将如下所示:

print(solution)
#trip_id comment paid 
#   123A          100
#   456B          250

3 个答案:

答案 0 :(得分:2)

完成group_by后,获取“注释”列中具有“空”的行的索引,并使用slice删除相邻的行之一

library(dplyr)
data %>%
   group_by(trip_id) %>%
   arrange(trip_id, comment != "void") %>%
   slice(setdiff(row_number(), which(comment == "void") + 0:1))
# A tibble: 2 x 3
# Groups:   trip_id [2]
#  trip_id comment  paid
#  <fct>   <fct>   <dbl>
#1 123A    ""        100
#2 456B    ""        250

答案 1 :(得分:2)

另一个 dplyr 解决方案(不如@akrun解决方案那么优雅):

library(dplyr)

# get ids to exclude
excl <- data[ data$comment == "void", "trip_id"]

data %>% 
  group_by(trip_id) %>% 
  mutate(rn = if_else(comment == "void", NA_integer_, row_number())) %>% 
  filter(trip_id %in% excl & rn > min(rn, na.rm = TRUE)) %>% 
  ungroup() %>% 
  select(-rn)

# # A tibble: 2 x 3
# trip_id comment  paid
# <fct>   <fct>   <dbl>
# 1 123A    ""        100
# 2 456B    ""        250

答案 2 :(得分:0)

发布问题后我想出的一个(相对)简单的答案,它也解决了以下情况:您在每个trip_id上有多个“空隙”,或者对于给定的trip_id没有“空隙” :

df_v <- data %>% 
  select(trip_id, comment) %>% 
  filter(trip_id == "void") %>% 
  group_by(trip_id) %>% 
  mutate(indexed = row_number())

df_nv <- data %>%
  filter(comment != "void") %>%
  group_by(trip_id) %>% 
  mutate(indexed = row_number())

final <- dplyr::anti_join(df_nv, df_v, by = c("id", "indexed")) %>% select(-indexed)