我目前有两个示例数据集:score
和 status
。在真实数据集中,有两个以上的人。
数据
score
Person Score ScoreDateTime
Person A 18 12/6/2020 5:06
Person A 18 12/6/2020 8:00
Person A 20 12/7/2020 7:40
Person B 18 12/27/2020 1:56
Person B 19 12/27/2020 20:00
Person B 18 12/28/2020 20:00
Person B 20 12/29/2020 20:00
status
Person Status StatusDateTime
Person A Woke up 12/7/2020 2:00
Person A Woke up 12/7/2020 4:00
Person A Woke up 12/7/2020 6:00
Person A Woke up 12/7/2020 8:00
Person A Woke up 12/7/2020 10:00
Person B Woke up 12/27/2020 5:00
Person B Woke up 12/28/2020 2:00
Person B Woke up 12/28/2020 4:00
Person B Woke up 12/28/2020 22:00
Person B Woke up 12/28/2020 23:59
我想过滤 status
以仅保留 ScoreDateTimes
之间的行,其中 Score
小于 19。
时间范围应该是从 Score
小于 19 到 Score
大于等于 19。所以:
所需的 status
数据集
Person Status StatusDateTime
Person A Woke up 12/7/2020 2:00
Person A Woke up 12/7/2020 4:00
Person A Woke up 12/7/2020 6:00
Person B Woke up 12/27/2020 5:00
Person B Woke up 12/28/2020 22:00
Person B Woke up 12/28/2020 23:59
非常感谢任何帮助(dplyr 方法会很棒)。
代码
score <- structure(list(Person = c("Person A", "Person A", "Person A",
"Person B", "Person B", "Person B", "Person B"), Score = c(18L,
18L, 20L, 18L, 19L, 18L, 20L), ScoreDateTime = c("12/6/2020 5:06",
"12/6/2020 8:00", "12/7/2020 7:40", "12/27/2020 1:56", "12/27/2020 20:00",
"12/28/2020 20:00", "12/29/2020 20:00")), class = "data.frame", row.names = c(NA,
-7L))
status <- structure(list(Person = c("Person A", "Person A", "Person A",
"Person A", "Person A", "Person B", "Person B", "Person B", "Person B",
"Person B", "Person B", "Person B"), Status = c("Woke up", "Woke up",
"Woke up", "Woke up", "Woke up", "Woke up", "Woke up", "Woke up",
"Woke up", "Woke up", "Woke up", "Woke up"), StatusDateTime = c("12/7/2020 2:00",
"12/7/2020 4:00", "12/7/2020 6:00", "12/7/2020 8:00", "12/7/2020 10:00",
"12/27/2020 5:00", "12/28/2020 2:00", "12/28/2020 4:00", "12/28/2020 22:00",
"12/28/2020 23:59", "12/29/2020 2:00", "12/29/2020 22:00")), class = "data.frame", row.names = c(NA,
-12L))
答案 0 :(得分:1)
这是一个 data.table
方法
library(date.table)
# Sample data -------------------
score <- fread("Person Score ScoreDateTime
PersonA 18 12/6/2020T5:06
PersonA 18 12/6/2020T8:00
PersonA 20 12/7/2020T7:40
PersonB 18 12/27/2020T1:56
PersonB 19 12/27/2020T20:00
PersonB 18 12/28/2020T20:00
PersonB 20 12/29/2020T20:00")
status <- fread("Person Status StatusDateTime
PersonA Wokeup 12/7/2020T2:00
PersonA Wokeup 12/7/2020T4:00
PersonA Wokeup 12/7/2020T6:00
PersonA Wokeup 12/7/2020T8:00
PersonA Wokeup 12/7/2020T10:00
PersonB Wokeup 12/27/2020T5:00
PersonB Wokeup 12/28/2020T2:00
PersonB Wokeup 12/28/2020T4:00
PersonB Wokeup 12/28/2020T22:00
PersonB Wokeup 12/28/2020T23:59")
score[, ScoreDateTime := as.POSIXct(ScoreDateTime, format = "%m/%d/%YT%H:%M")]
status[, StatusDateTime := as.POSIXct(StatusDateTime, format = "%m/%d/%YT%H:%M")]
# Code ---------------------
score[, ScoreDateTime2 := shift(ScoreDateTime, n = 1, type = "lead") - 1, by = Person]
status[ score[ Score <= 18, ], match := 1, on = .(Person, StatusDateTime >= ScoreDateTime, StatusDateTime <= ScoreDateTime2)]
status[ match == 1,][, match := NULL][]
# Person Status StatusDateTime
# 1: PersonA Wokeup 2020-12-07 02:00:00
# 2: PersonA Wokeup 2020-12-07 04:00:00
# 3: PersonA Wokeup 2020-12-07 06:00:00
# 4: PersonB Wokeup 2020-12-27 05:00:00
# 5: PersonB Wokeup 2020-12-28 22:00:00
# 6: PersonB Wokeup 2020-12-28 23:59:00