我想基于条件“开始”在一列中找到时间戳,然后在同一列中找到满足另一条件的第一行的时间戳,从而找到两个时间戳之间的差异, “停止”。基本上,我们使用程序来“启动”行为并“停止”行为,以便我们可以计算行为的持续时间。
我已经尝试改编本文中的代码:subtract value from previous row by group
但是我无法弄清楚如何使线索能够满足同一列即将出现的行中的条件。由于可能存在具有“开始”但没有“停止”的“事件”行为,这一事实使情况变得复杂。数据框示例。
Data
Behavior Modifier_1 Time_relative_s
BodyLength Start 122.11
Growl Start 129.70
Body Length Stop 132.26
Body Length Start 157.79
Body Length Stop 258.85
Body Length Start 270.12
Bark Start 272.26
Growl Start 275.68
Body Length Stop 295.37
我想要这个:
Behavior Modifier_1 Time_relative_s diff
BodyLength Start 122.11 10.15
Growl Start 129.70
Body Length Stop 132.26
Body Length Start 157.79 101.06
Body Length Stop 258.85
Body Length Start 270.12 25.25
Bark Start 272.26
Growl Start 275.68
Body Length Stop 295.37
我尝试使用dplyr管道:
test<-u%>%
filter(Modifier_1 %in% c("Start","Stop")) %>%
arrange(Time_Relative_s) %>%
mutate(diff = lead(Time_Relative_s, default = first(Time_Relative_s=="Stop")-Time-Relative_s)
但是我一定不能使用Lead,因为这只会在diff列中为我返回Time_Relative_s。有什么建议么?感谢您的帮助!
答案 0 :(得分:2)
我们可能需要根据'stop'的出现来创建分组变量,然后获取与'Modifier_1'中第一个'Start','Stop'值的位置相对应的'Time_relative_s'的差异
library(dplyr)
df1 %>%
group_by(grp = cumsum(lag(Modifier_1 == "Stop", default = FALSE))) %>%
mutate(diff = Time_relative_s[match("Stop", Modifier_1)] -
Time_relative_s[match("Start", Modifier_1)],
diff = replace(diff, row_number() > 1, NA_real_)) %>%
ungroup %>%
select(-grp)
# A tibble: 9 x 4
# Behavior Modifier_1 Time_relative_s diff
# <chr> <chr> <dbl> <dbl>
#1 BodyLength Start 122. 10.1
#2 Growl Start 130. NA
#3 Body Length Stop 132. NA
#4 Body Length Start 158. 101.
#5 Body Length Stop 259. NA
#6 Body Length Start 270. 25.2
#7 Bark Start 272. NA
#8 Growl Start 276. NA
#9 Body Length Stop 295. NA
df1 <- structure(list(Behavior = c("BodyLength", "Growl", "Body Length",
"Body Length", "Body Length", "Body Length", "Bark", "Growl",
"Body Length"), Modifier_1 = c("Start", "Start", "Stop", "Start",
"Stop", "Start", "Start", "Start", "Stop"), Time_relative_s = c(122.11,
129.7, 132.26, 157.79, 258.85, 270.12, 272.26, 275.68, 295.37
)), row.names = c(NA, -9L), class = "data.frame")