我的数据框如下:
test <-
data.frame(
id = c(4, 6, 9, 12, 14, 15),
dates = seq(as.Date("2019-01-01"), as.Date("2019-01-06"), "days"),
staus = c("REGULAR", "PENDING", "ANOTHER", "PENDING", "PENDING", "PENDING TOO")
)
我想做的是获得最后个待审核或待审核状态,但是要在另一个REGULAR /另一个日期之前获得状态REGULAR或PENDING。
换句话说,结果应该是
result <-
data.frame(
id = c(4, 6, 9, 12, 14, 15),
dates = seq(as.Date("2019-01-01"), as.Date("2019-01-06"), "days"),
staus = c("REGULAR", "PENDING", "ANOTHER", "PENDING", "PENDING", "PENDING TOO"),
staus_summary = c("2019-01-02", NA, "2019-01-06", NA, NA, NA)
)
这是我创建的内容,但是我有一个问题,就是可能并不总是仅在下一行上具有所需的状态。
result <- test %>%
mutate(
status_summary = if_else(status %in% c("REGULAR", "ANOTHER") & lag(status) %in% c("PENDING", "PENDING TOO"), as.character(dates), NA_character_)
)
答案 0 :(得分:1)
一种方法是在每次出现"REGULAR"
或"ANOTHER"
时创建组,并用last
dates
值替换组中的第一个值。
library(dplyr)
test %>%
group_by(group = cumsum(staus %in% c("REGULAR", "ANOTHER"))) %>%
mutate(staus_summary = as.Date(ifelse(row_number() == 1,
last(dates), NA_real_))) %>%
ungroup() %>%
select(-group)
# id dates staus staus_summary
# <dbl> <date> <fct> <date>
#1 4 2019-01-01 REGULAR 2019-01-02
#2 6 2019-01-02 PENDING NA
#3 9 2019-01-03 ANOTHER 2019-01-06
#4 12 2019-01-04 PENDING NA
#5 14 2019-01-05 PENDING NA
#6 15 2019-01-06 PENDING TOO NA