这是此帖的后续问题: Loop through dataframe in R and measure time difference between two values
我已经获得了以下代码的极好帮助,以计算某个刺激与下一个刺激之间的时间差(以分钟为单位):
df$Date <- as.POSIXct(strptime(df$Date,"%d.%m.%Y %H:%M"))
df %>%
arrange(User,Date)%>%
mutate(difftime= difftime(lead(Date),Date, units = "mins") ) %>%
group_by(User)%>%
filter((StimuliA==1 | StimuliB==1) & lead(Responses)==1)`
数据集:
structure(list(User = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L), Date = c("24.11.2015 20:39",
"25.11.2015 11:42", "11.01.2016 22:46", "26.11.2015 22:42", "04.03.2016 05:45",
"24.11.2015 13:13", "25.11.2015 13:59", "27.11.2015 12:18", "28.05.2016 06:49",
"06.07.2016 09:46", "03.12.2015 09:32", "07.12.2015 08:18", "08.12.2015 19:40",
"08.12.2015 19:40", "22.12.2015 08:50", "22.12.2015 08:52", "22.12.2015 08:52",
"22.12.2015 20:46"), StimuliA = c(1L, 0L, 0L, 1L, 1L, 1L, 0L,
1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), StimuliB = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L), Responses = c(0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L,
0L, 1L, 0L, 1L, 1L, 1L, 1L)), .Names = c("User", "Date", "StimuliA",
"StimuliB", "Responses"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L), spec = structure(list(cols = structure(list(
User = structure(list(), class = c("collector_integer", "collector"
)), Date = structure(list(), class = c("collector_character",
"collector")), StimuliA = structure(list(), class = c("collector_integer",
"collector")), StimuliB = structure(list(), class = c("collector_integer",
"collector")), Responses = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("User", "Date", "StimuliA", "StimuliB",
"Responses")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
目标/问题 lead
广告有助于确定刺激== 1(A或B)与下一个响应[按日期/时间排序]之间的时差(响应== 1)。我将如何更改该代码以查找刺激A或B与此序列中的最后响应之间的时间差。 (直到下一个刺激发生)
期望的输出:
User Date StimuliA StimuliB Responses time diff Seq_ID
1 24.11.2015 20:39 1 0 0 1_1_0
1 25.11.2015 11:42 0 0 1 1_1_0
1 11.01.2016 22:46 0 0 1 69247 1_1_0
2 26.11.2015 22:42 1 0 0 2_1_0
2 04.03.2016 05:45 0 1 0 2_1_1
3 24.11.2015 13:13 1 0 0 3_1_0
3 25.11.2015 13:59 0 0 1 1486 3_1_0
3 27.11.2015 12:18 1 0 0 3_2_0
3 28.05.2016 06:49 0 0 1 3_2_0
3 06.07.2016 09:46 0 0 1 319528 3_2_0
4 03.12.2015 09:32 1 0 0 4_1_0
4 07.12.2015 08:18 1 0 0 4_2_0
4 08.12.2015 19:40 0 0 1 2122 4_1_0
4 08.12.2015 19:40 0 1 0 4_2_1
4 22.12.2015 08:50 0 0 1 19510 4_2_1
5 22.12.2015 08:52 0 0 1 5_0_0
5 22.12.2015 08:52 0 0 1 5_0_0
5 22.12.2015 20:46 0 0 1 5_0_0
对于刺激A,这意味着值c(69247,31952,2122)和B c(1486,19510)。
答案 0 :(得分:2)
试试这个。
# df$Date <- as.POSIXct(strptime(df$Date,"%d.%m.%Y %H:%M"))
df %>%
arrange(User, Date) %>%
group_by(User) %>%
mutate(
last.date = Date[which(StimuliA == 1L)[c(1,1:sum(StimuliA == 1L))][cumsum(StimuliA == 1L)+ 1]]
) %>%
mutate(
timesince = ifelse(Responses == 1L, Date - last.date, NA)
)
首先创建一个记录最后一个刺激数据的列,然后使用ifelse
和lag
来获取当前日期和最后一个刺激日期之间的差异。您可以filter
仅提取最后的回复。
有一种更清洁的方式来做&#34; last.date&#34;使用zoo.na.locf
进行操作,但我不想假设您对另一个包依赖项没问题。
编辑要识别序列(如果我正确理解了您的意思&#34;序列&#34;),请继续链接
%>% mutate(sequence = cumsum(StimuliA))
以识别在正刺激之后定义为观察的序列。要过滤掉序列的最后一个响应,请使用
继续链接%>% group_by(User, sequence) %>%
filter(timesince == max(timesince, na.rm = TRUE))
按顺序(和用户)分组,然后提取与每个序列相关的最大时间差(这将对应于序列的最后一个正响应)。