我试图遍历我拥有的日志文件数据集,以添加一个变量,其中为每个观察结果存储服务器会话号。对于第一行,我想创建一个新变量'session number',其值为1。此后,如果'ResearchNumber'与之前的行不同,我希望下一行使用不同的会话号。如果它是相同的'ResearchNumber',我想检查Posixct变量中的差异是否大于18000秒(或30分钟)。如果是这种情况,我想创建一个不同的会话号(通过将其增加1)。在所有其他情况下,我希望会话号与上一行相同。总而言之,我想根据每个参与者的不活动情况创建会话号,时间超过30分钟。
我已经尝试了好几件事,但是我的代码似乎并没有遍历所有行,并且在其他解决方案中,时差的计算方式不正确。
我希望有人可以帮助我解决此问题。感谢所有帮助!
# create example data
ResearchNumber <- c("AL001","AL002","AL003")
DateTimeTag <- c(
as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
as.POSIXct('2014-09-29 10:38:18', tz='GMT')
)
logdata <- data.frame(ResearchNumber, DateTimeTag)
# loop through logdata to add variable to every observation with a server session number
linecount <- 1
for (lines in logdata) {
if (linecount == 1) {
session_number <- 1
logdata$session_number <- session_number
datetime <- logdata$DateTimeTag
participantbefore <- logdata$ResearchNumber
linecount <- (linecount + 1)
}
else if (linecount > 1) {
difference <- (logdata$DateTimeTag - datetime)
if (logdata$ResearchNumber != participantbefore) {
logdata$session_number <- (session_number + 1)
participantbefore <- logdata$ResearchNumber
session_number <- (session_number + 1)
datetime <- logdata$DateTimeTag
}
else if (difference > 18000) {
logdata$session_number <- (session_number + 1)
participantbefore <- logdata$ResearchNumber
session_number <- (session_number + 1)
datetime <- logdata$DateTimeTag
}
else {
logdata$session_number <- (session_number)
participantbefore <- logdata$ResearchNumber
datetime <- logdata$DateTimeTag
}
}
}
答案 0 :(得分:0)
你击败了我@docendo discimus!
这是dplyr解决方案。
library(tidyverse) # brings in dplyr library
# make better example data
ResearchNumber <- c("AL001","AL002","AL003", "AL003", "AL003")
DateTimeTag <- c(
as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
as.POSIXct('2014-09-29 10:38:18', tz='GMT'),
as.POSIXct('2014-09-29 12:00:00', tz='GMT'),
as.POSIXct('2014-09-29 12:15:18', tz='GMT')
)
logdata <- data.frame(ResearchNumber, DateTimeTag)
logdata
logdata <- logdata %>%
arrange(ResearchNumber) %>%
group_by(ResearchNumber) %>%
mutate(difftime = difftime(DateTimeTag, lag(DateTimeTag), units = "mins"),
DiffSess = case_when(
is.na(difftime) ~ TRUE,
difftime > 30 ~ TRUE,
TRUE ~ FALSE)) %>%
ungroup() %>%
mutate(session_number = cumsum(DiffSess))
结果
ResearchNumber DateTimeTag session_number difftime DiffSess
<fct> <dttm> <int> <drtn> <lgl>
1 AL001 2014-09-29 10:35:40 1 NA mins TRUE
2 AL002 2014-09-29 10:35:42 2 NA mins TRUE
3 AL003 2014-09-29 10:38:18 3 NA mins TRUE
4 AL003 2014-09-29 12:00:00 4 81.7 mins TRUE
5 AL003 2014-09-29 12:15:18 4 15.3 mins FALSE