我有这个数据集:
data <- data.frame(trip_id = c("456B", "123A", "123A", "456B", "456B", "123A", "789C", "789C"),
comment = c("void", "", "", "", "", "void", "", "void"),
paid = c(0, 100, 100, 250, 250, 0, 125, 0))
print(data)
#trip_id comment paid
# 456B void 0
# 123A 100
# 123A 100
# 456B 250
# 456B 250
# 123A void 0
# 789C 125
# 789C void 0
我希望能够以编程方式删除comment
字段中带有“ void”的行,以及每个trip_id
中的行,其中每个comment
的{ {1}}字段。使用示例,输出将如下所示:
print(solution)
#trip_id comment paid
# 123A 100
# 456B 250
答案 0 :(得分:2)
完成group_by
后,获取“注释”列中具有“空”的行的索引,并使用slice
删除相邻的行之一
library(dplyr)
data %>%
group_by(trip_id) %>%
arrange(trip_id, comment != "void") %>%
slice(setdiff(row_number(), which(comment == "void") + 0:1))
# A tibble: 2 x 3
# Groups: trip_id [2]
# trip_id comment paid
# <fct> <fct> <dbl>
#1 123A "" 100
#2 456B "" 250
答案 1 :(得分:2)
另一个 dplyr 解决方案(不如@akrun解决方案那么优雅):
library(dplyr)
# get ids to exclude
excl <- data[ data$comment == "void", "trip_id"]
data %>%
group_by(trip_id) %>%
mutate(rn = if_else(comment == "void", NA_integer_, row_number())) %>%
filter(trip_id %in% excl & rn > min(rn, na.rm = TRUE)) %>%
ungroup() %>%
select(-rn)
# # A tibble: 2 x 3
# trip_id comment paid
# <fct> <fct> <dbl>
# 1 123A "" 100
# 2 456B "" 250
答案 2 :(得分:0)
发布问题后我想出的一个(相对)简单的答案,它也解决了以下情况:您在每个trip_id
上有多个“空隙”,或者对于给定的trip_id
没有“空隙” :
df_v <- data %>%
select(trip_id, comment) %>%
filter(trip_id == "void") %>%
group_by(trip_id) %>%
mutate(indexed = row_number())
df_nv <- data %>%
filter(comment != "void") %>%
group_by(trip_id) %>%
mutate(indexed = row_number())
final <- dplyr::anti_join(df_nv, df_v, by = c("id", "indexed")) %>% select(-indexed)