使用Lead和dplyr计算两个时间戳之间的差异

时间:2019-05-01 23:36:51

标签: r dplyr lead

我想基于条件“开始”在一列中找到时间戳,然后在同一列中找到满足另一条件的第一行的时间戳,从而找到两个时间戳之间的差异, “停止”。基本上,我们使用程序来“启动”行为并“停止”行为,以便我们可以计算行为的持续时间。

我已经尝试改编本文中的代码:subtract value from previous row by group

但是我无法弄清楚如何使线索能够满足同一列即将出现的行中的条件。由于可能存在具有“开始”但没有“停止”的“事件”行为,这一事实使情况变得复杂。数据框示例。

Data
Behavior             Modifier_1           Time_relative_s              
BodyLength           Start                122.11      
Growl                Start                129.70
Body Length          Stop                 132.26      
Body Length          Start                157.79      
Body Length          Stop                 258.85      
Body Length          Start                270.12    
Bark                 Start                272.26
Growl                Start                275.68
Body Length          Stop                 295.37

我想要这个:

Behavior             Modifier_1           Time_relative_s       diff             
BodyLength           Start                122.11                10.15
Growl                Start                129.70                 
Body Length          Stop                 132.26                
Body Length          Start                157.79                101.06  
Body Length          Stop                 258.85      
Body Length          Start                270.12                25.25    
Bark                 Start                272.26
Growl                Start                275.68
Body Length          Stop                 295.37

我尝试使用dplyr管道:

test<-u%>%
    filter(Modifier_1 %in% c("Start","Stop")) %>%
    arrange(Time_Relative_s) %>%
    mutate(diff = lead(Time_Relative_s, default = first(Time_Relative_s=="Stop")-Time-Relative_s)

但是我一定不能使用Lead,因为这只会在diff列中为我返回Time_Relative_s。有什么建议么?感谢您的帮助!

1 个答案:

答案 0 :(得分:2)

我们可能需要根据'stop'的出现来创建分组变量,然后获取与'Modifier_1'中第一个'Start','Stop'值的位置相对应的'Time_relative_s'的差异

library(dplyr)
df1 %>% 
   group_by(grp = cumsum(lag(Modifier_1 == "Stop", default = FALSE))) %>% 
   mutate(diff = Time_relative_s[match("Stop", Modifier_1)] - 
                  Time_relative_s[match("Start", Modifier_1)], 
          diff = replace(diff, row_number() > 1, NA_real_)) %>%
   ungroup %>%
   select(-grp)
# A tibble: 9 x 4
#  Behavior    Modifier_1 Time_relative_s  diff
#  <chr>       <chr>                <dbl> <dbl>
#1 BodyLength  Start                 122.  10.1
#2 Growl       Start                 130.  NA  
#3 Body Length Stop                  132.  NA  
#4 Body Length Start                 158. 101. 
#5 Body Length Stop                  259.  NA  
#6 Body Length Start                 270.  25.2
#7 Bark        Start                 272.  NA  
#8 Growl       Start                 276.  NA  
#9 Body Length Stop                  295.  NA  

数据

df1 <- structure(list(Behavior = c("BodyLength", "Growl", "Body Length", 
"Body Length", "Body Length", "Body Length", "Bark", "Growl", 
"Body Length"), Modifier_1 = c("Start", "Start", "Stop", "Start", 
"Stop", "Start", "Start", "Start", "Stop"), Time_relative_s = c(122.11, 
129.7, 132.26, 157.79, 258.85, 270.12, 272.26, 275.68, 295.37
)), row.names = c(NA, -9L), class = "data.frame")