在R中工作我试图在更改后删除所有行。一家公司开业3年,然后关闭,随后的几年中,已关闭的标志将保留在表中。我想删除2个额外年份,仅保留其关闭年份的数据。某些位置在同一年关闭并重新开放,因此不应更改。
我已经尝试在status = "close"
的最小日期进行切片,但是由于重新打开的位置,因此无法使用。
样本数据
date <- c("2014","2015","2016","2017","2018","2019","2016","2017","2018","2019","2015","2016","2017","2018","2018","2019","2019")
ID <- c("1","1","1","1","1", "1","2","2","2","2","3","3","3","3","3","3", "3")
status <- c("open", "open", "open", "close", "close", "close", "open", "open","open","open","open", "open", "open","close", "open", "close", "open")
start <- data.frame(date, ID, status)
上面我要删除ID = 1的2018和2019
date <- c("2014","2015","2016","2017","2016","2017","2018","2019","2015","2016","2017","2018","2018","2019","2019")
ID <- c("1","1","1","1","2","2","2","2","3","3","3","3","3","3", "3")
status <- c("open", "open", "open", "close", "open", "open","open","open","open", "open", "open","close", "open", "close", "open")
ideal_outcome <- data.frame(date, ID, status)
答案 0 :(得分:3)
使用rleid
中的data.table
的一种方法是访问group_by
ID
并连续运行status
,对于{{1 }},然后选择status = "close"
的所有行。
"open"
但是,您实际上并不需要仅为一个功能导入library(dplyr)
library(data.table)
start %>%
group_by(ID, group = rleid(status)) %>%
slice(if (first(status) == "open") seq_len(n()) else 1L) %>%
mutate(status = replace(as.character(status),
status == "close", "permanently_closed")) %>%
ungroup() %>%
select(-group)
# A tibble: 15 x 3
# date ID status
# <fct> <fct> <chr>
# 1 2014 1 open
# 2 2015 1 open
# 3 2016 1 open
# 4 2017 1 permanently_closed
# 5 2016 2 open
# 6 2017 2 open
# 7 2018 2 open
# 8 2019 2 open
# 9 2015 3 open
#10 2016 3 open
#11 2017 3 open
#12 2018 3 permanently_closed
#13 2018 3 open
#14 2019 3 permanently_closed
#15 2019 3 open
,data.table
的行为也可以与基础rleid
复制
rle
@Sotos建议使用start %>%
group_by(ID, group = with(rle(as.character(status)),
rep(seq_along(values), lengths))) %>%
slice(if (first(status) == "open") seq_len(n()) else 1L) %>%
ungroup() %>%
select(-group)
,factor
和diff
cumsum