根据来自另一个数据集的条件时间范围过滤数据集

时间:2021-04-01 17:35:50

标签: r dplyr

我目前有两个示例数据集:scorestatus。在真实数据集中,有两个以上的人。

数据

score

Person      Score ScoreDateTime
Person A    18    12/6/2020 5:06        
Person A    18    12/6/2020 8:00        
Person A    20    12/7/2020 7:40        
Person B    18    12/27/2020 1:56       
Person B    19    12/27/2020 20:00      
Person B    18    12/28/2020 20:00      
Person B    20    12/29/2020 20:00

status

Person      Status   StatusDateTime
Person A    Woke up  12/7/2020 2:00     
Person A    Woke up  12/7/2020 4:00     
Person A    Woke up  12/7/2020 6:00     
Person A    Woke up  12/7/2020 8:00     
Person A    Woke up  12/7/2020 10:00        
Person B    Woke up  12/27/2020 5:00        
Person B    Woke up  12/28/2020 2:00        
Person B    Woke up  12/28/2020 4:00        
Person B    Woke up  12/28/2020 22:00       
Person B    Woke up  12/28/2020 23:59

我想过滤 status 以仅保留 ScoreDateTimes 之间的行,其中 Score 小于 19。

时间范围应该是从 Score 小于 19 到 Score 大于等于 19。所以:

  • 对于 A 人,我想要 12/6/2020 5:06 - 12/7/2020 7:40 之间的所有行
  • 对于 B 人,我想要 12/27/2020 1:56 - 12/27/2020 20:00 和 12/28/2020 20:00 - 12/29/2020 20:00< 之间的所有行/li>

所需的 status 数据集

Person      Status   StatusDateTime
Person A    Woke up  12/7/2020 2:00     
Person A    Woke up  12/7/2020 4:00     
Person A    Woke up  12/7/2020 6:00  
Person B    Woke up  12/27/2020 5:00   
Person B    Woke up  12/28/2020 22:00       
Person B    Woke up  12/28/2020 23:59

非常感谢任何帮助(dplyr 方法会很棒)。

代码

score <- structure(list(Person = c("Person A", "Person A", "Person A", 
"Person B", "Person B", "Person B", "Person B"), Score = c(18L, 
18L, 20L, 18L, 19L, 18L, 20L), ScoreDateTime = c("12/6/2020 5:06", 
"12/6/2020 8:00", "12/7/2020 7:40", "12/27/2020 1:56", "12/27/2020 20:00", 
"12/28/2020 20:00", "12/29/2020 20:00")), class = "data.frame", row.names = c(NA, 
-7L))

status <- structure(list(Person = c("Person A", "Person A", "Person A", 
"Person A", "Person A", "Person B", "Person B", "Person B", "Person B", 
"Person B", "Person B", "Person B"), Status = c("Woke up", "Woke up", 
"Woke up", "Woke up", "Woke up", "Woke up", "Woke up", "Woke up", 
"Woke up", "Woke up", "Woke up", "Woke up"), StatusDateTime = c("12/7/2020 2:00", 
"12/7/2020 4:00", "12/7/2020 6:00", "12/7/2020 8:00", "12/7/2020 10:00", 
"12/27/2020 5:00", "12/28/2020 2:00", "12/28/2020 4:00", "12/28/2020 22:00", 
"12/28/2020 23:59", "12/29/2020 2:00", "12/29/2020 22:00")), class = "data.frame", row.names = c(NA, 
-12L))

1 个答案:

答案 0 :(得分:1)

这是一个 data.table 方法

library(date.table)
# Sample data -------------------
score <- fread("Person      Score ScoreDateTime
PersonA    18    12/6/2020T5:06        
PersonA    18    12/6/2020T8:00        
PersonA    20    12/7/2020T7:40        
PersonB    18    12/27/2020T1:56       
PersonB    19    12/27/2020T20:00      
PersonB    18    12/28/2020T20:00      
PersonB    20    12/29/2020T20:00")
status <- fread("Person      Status   StatusDateTime
PersonA    Wokeup  12/7/2020T2:00     
PersonA    Wokeup  12/7/2020T4:00     
PersonA    Wokeup  12/7/2020T6:00     
PersonA    Wokeup  12/7/2020T8:00     
PersonA    Wokeup  12/7/2020T10:00        
PersonB    Wokeup  12/27/2020T5:00        
PersonB    Wokeup  12/28/2020T2:00        
PersonB    Wokeup  12/28/2020T4:00        
PersonB    Wokeup  12/28/2020T22:00       
PersonB    Wokeup  12/28/2020T23:59")
score[, ScoreDateTime := as.POSIXct(ScoreDateTime, format = "%m/%d/%YT%H:%M")]
status[, StatusDateTime := as.POSIXct(StatusDateTime, format = "%m/%d/%YT%H:%M")]

# Code ---------------------
score[, ScoreDateTime2 := shift(ScoreDateTime, n = 1, type = "lead") - 1, by = Person]
status[ score[ Score <= 18, ], match := 1, on = .(Person, StatusDateTime >= ScoreDateTime, StatusDateTime <= ScoreDateTime2)]
status[ match == 1,][, match := NULL][]
#     Person Status      StatusDateTime
# 1: PersonA Wokeup 2020-12-07 02:00:00
# 2: PersonA Wokeup 2020-12-07 04:00:00
# 3: PersonA Wokeup 2020-12-07 06:00:00
# 4: PersonB Wokeup 2020-12-27 05:00:00
# 5: PersonB Wokeup 2020-12-28 22:00:00
# 6: PersonB Wokeup 2020-12-28 23:59:00
相关问题