当下一行匹配时,如何在第一次出现后删除行

时间:2019-07-11 12:14:24

标签: r

在R中工作我试图在更改后删除所有行。一家公司开业3年,然后关闭,随后的几年中,已关闭的标志将保留在表中。我想删除2个额外年份,仅保留其关闭年份的数据。某些位置在同一年关闭并重新开放,因此不应更改。

我已经尝试在status = "close"的最小日期进行切片,但是由于重新打开的位置,因此无法使用。

样本数据

date <- c("2014","2015","2016","2017","2018","2019","2016","2017","2018","2019","2015","2016","2017","2018","2018","2019","2019")
ID <- c("1","1","1","1","1", "1","2","2","2","2","3","3","3","3","3","3", "3")
status <- c("open", "open", "open", "close", "close", "close", "open", "open","open","open","open", "open", "open","close", "open", "close", "open")


start <- data.frame(date, ID, status)

上面我要删除ID = 1的2018和2019

date <- c("2014","2015","2016","2017","2016","2017","2018","2019","2015","2016","2017","2018","2018","2019","2019")
ID <- c("1","1","1","1","2","2","2","2","3","3","3","3","3","3", "3")
status <- c("open", "open", "open", "close", "open", "open","open","open","open", "open", "open","close", "open", "close", "open")


ideal_outcome <- data.frame(date, ID, status)

1 个答案:

答案 0 :(得分:3)

使用rleid中的data.table的一种方法是访问group_by ID并连续运行status,对于{{1 }},然后选择status = "close"的所有行。

"open"

但是,您实际上并不需要仅为一个功能导入library(dplyr) library(data.table) start %>% group_by(ID, group = rleid(status)) %>% slice(if (first(status) == "open") seq_len(n()) else 1L) %>% mutate(status = replace(as.character(status), status == "close", "permanently_closed")) %>% ungroup() %>% select(-group) # A tibble: 15 x 3 # date ID status # <fct> <fct> <chr> # 1 2014 1 open # 2 2015 1 open # 3 2016 1 open # 4 2017 1 permanently_closed # 5 2016 2 open # 6 2017 2 open # 7 2018 2 open # 8 2019 2 open # 9 2015 3 open #10 2016 3 open #11 2017 3 open #12 2018 3 permanently_closed #13 2018 3 open #14 2019 3 permanently_closed #15 2019 3 open data.table的行为也可以与基础rleid复制

rle

@Sotos建议使用start %>% group_by(ID, group = with(rle(as.character(status)), rep(seq_along(values), lengths))) %>% slice(if (first(status) == "open") seq_len(n()) else 1L) %>% ungroup() %>% select(-group) factordiff

创建组的另一种方法

cumsum