使用dplyr

时间:2019-02-27 15:13:57

标签: r dplyr

我想找到一个值的第一行,但是仅当它出现在另一个值之后。我有一个燕窝盒使用的时间序列数据集,对于每个盒,当盒被占用后第一次移出时,我想过滤到该行。这是数据的简化示例:

# A tibble: 20 x 3
   NestID Date       Status  
   <chr>  <date>     <chr>   
 1 WA18   2019-02-01 Empty   
 2 WA18   2019-02-02 Empty   
 3 WA18   2019-02-03 Empty   
 4 WA18   2019-02-04 Occupied
 5 WA18   2019-02-05 Occupied
 6 WA18   2019-02-06 Occupied
 7 WA18   2019-02-07 Empty   
 8 WA18   2019-02-08 Empty 

dat <- structure(list(NestID = c("WA18", "WA18", "WA18", "WA18", "WA18", 
    "WA18", "WA18", "WA18", "WA18", "WA20", "WA20", "WA20", "WA20", 
    "WA20", "WA20", "WA20", "WA20", "WA20", "WA20", "WA20"), Date = structure(c(17928, 
    17929, 17930, 17931, 17932, 17933, 17934, 17935, 17936, 17555, 
    17556, 17557, 17558, 17559, 17560, 17561, 17562, 17563, 17564, 
    17565), class = "Date"), Status = c("Empty", "Empty", "Empty", 
    "Occupied", "Occupied", "Occupied", "Empty", "Empty", "Empty", 
    "Empty", "Empty", "Empty", "Empty", "Empty", "Empty", "Occupied", 
    "Occupied", "Empty", "Empty", "Empty")), class = c("tbl_df", 
    "tbl", "data.frame"), row.names = c(NA, -20L))

因此,对于嵌套WA18,我想过滤到日期为2019-02-07的行(此框在被占用后首先被认为是空的)。不太清楚索引该行的最佳方法是什么,但是我想使用dplyr这样做。

2 个答案:

答案 0 :(得分:3)

您可以使用lag来获取上一行的值:

dat %>%
  group_by(NestID) %>%
  filter(Status == "Empty" &
           lag(Status) == "Occupied")


#    NestID Date       Status
#    <chr>  <date>     <chr> 
#  1 WA18   2019-02-07 Empty 
#  2 WA20   2018-02-01 Empty 

答案 1 :(得分:2)

使用data.table

library(data.table)

setDT(dat)[, .SD[Status == "Empty" & shift(Status) == "Occupied"], by = NestID]

输出:

   NestID       Date Status
1:   WA18 2019-02-07  Empty
2:   WA20 2018-02-01  Empty