在字符串相等的地方领先

时间:2019-10-06 14:35:19

标签: r dplyr

我的数据框如下:

  test <-
    data.frame(
      id = c(4, 6, 9, 12, 14, 15),
      dates = seq(as.Date("2019-01-01"), as.Date("2019-01-06"), "days"),
      staus = c("REGULAR", "PENDING", "ANOTHER", "PENDING", "PENDING", "PENDING TOO")
    )

我想做的是获得最后个待审核或待审核状态,但是要在另一个REGULAR /另一个日期之前获得状态REGULAR或PENDING。

换句话说,结果应该是

  result <-
    data.frame(
      id = c(4, 6, 9, 12, 14, 15),
      dates = seq(as.Date("2019-01-01"), as.Date("2019-01-06"), "days"),
      staus = c("REGULAR", "PENDING", "ANOTHER", "PENDING", "PENDING", "PENDING TOO"),
      staus_summary = c("2019-01-02", NA, "2019-01-06", NA, NA, NA)
  )

这是我创建的内容,但是我有一个问题,就是可能并不总是仅在下一行上具有所需的状态。

  result <- test %>%
    mutate(
      status_summary = if_else(status %in% c("REGULAR", "ANOTHER") & lag(status) %in% c("PENDING", "PENDING TOO"), as.character(dates), NA_character_)
    )

1 个答案:

答案 0 :(得分:1)

一种方法是在每次出现"REGULAR""ANOTHER"时创建组,并用last dates值替换组中的第一个值。

library(dplyr)

test %>%
  group_by(group = cumsum(staus %in% c("REGULAR", "ANOTHER"))) %>%
  mutate(staus_summary = as.Date(ifelse(row_number() == 1, 
                         last(dates), NA_real_))) %>%
  ungroup() %>%
  select(-group)

#     id dates      staus      staus_summary
#   <dbl> <date>     <fct>       <date>       
#1     4 2019-01-01 REGULAR     2019-01-02   
#2     6 2019-01-02 PENDING     NA           
#3     9 2019-01-03 ANOTHER     2019-01-06   
#4    12 2019-01-04 PENDING     NA           
#5    14 2019-01-05 PENDING     NA           
#6    15 2019-01-06 PENDING TOO NA