我想找到一个值的第一行,但是仅当它出现在另一个值之后。我有一个燕窝盒使用的时间序列数据集,对于每个盒,当盒被占用后第一次移出时,我想过滤到该行。这是数据的简化示例:
# A tibble: 20 x 3
NestID Date Status
<chr> <date> <chr>
1 WA18 2019-02-01 Empty
2 WA18 2019-02-02 Empty
3 WA18 2019-02-03 Empty
4 WA18 2019-02-04 Occupied
5 WA18 2019-02-05 Occupied
6 WA18 2019-02-06 Occupied
7 WA18 2019-02-07 Empty
8 WA18 2019-02-08 Empty
dat <- structure(list(NestID = c("WA18", "WA18", "WA18", "WA18", "WA18",
"WA18", "WA18", "WA18", "WA18", "WA20", "WA20", "WA20", "WA20",
"WA20", "WA20", "WA20", "WA20", "WA20", "WA20", "WA20"), Date = structure(c(17928,
17929, 17930, 17931, 17932, 17933, 17934, 17935, 17936, 17555,
17556, 17557, 17558, 17559, 17560, 17561, 17562, 17563, 17564,
17565), class = "Date"), Status = c("Empty", "Empty", "Empty",
"Occupied", "Occupied", "Occupied", "Empty", "Empty", "Empty",
"Empty", "Empty", "Empty", "Empty", "Empty", "Empty", "Occupied",
"Occupied", "Empty", "Empty", "Empty")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L))
因此,对于嵌套WA18,我想过滤到日期为2019-02-07的行(此框在被占用后首先被认为是空的)。不太清楚索引该行的最佳方法是什么,但是我想使用dplyr这样做。
答案 0 :(得分:3)
您可以使用lag
来获取上一行的值:
dat %>%
group_by(NestID) %>%
filter(Status == "Empty" &
lag(Status) == "Occupied")
# NestID Date Status
# <chr> <date> <chr>
# 1 WA18 2019-02-07 Empty
# 2 WA20 2018-02-01 Empty
答案 1 :(得分:2)
使用data.table
:
library(data.table)
setDT(dat)[, .SD[Status == "Empty" & shift(Status) == "Occupied"], by = NestID]
输出:
NestID Date Status
1: WA18 2019-02-07 Empty
2: WA20 2018-02-01 Empty