如何使用for循环基于posixct for日志文件之间的差异创建新变量

时间:2019-06-25 14:27:14

标签: r for-loop if-statement posixct

我试图遍历我拥有的日志文件数据集,以添加一个变量,其中为每个观察结果存储服务器会话号。对于第一行,我想创建一个新变量'session number',其值为1。此后,如果'ResearchNumber'与之前的行不同,我希望下一行使用不同的会话号。如果它是相同的'ResearchNumber',我想检查Posixct变量中的差异是否大于18000秒(或30分钟)。如果是这种情况,我想创建一个不同的会话号(通过将其增加1)。在所有其他情况下,我希望会话号与上一行相同。总而言之,我想根据每个参与者的不活动情况创建会话号,时间超过30分钟。

我已经尝试了好几件事,但是我的代码似乎并没有遍历所有行,并且在其他解决方案中,时差的计算方式不正确。

我希望有人可以帮助我解决此问题。感谢所有帮助!


# create example data

ResearchNumber <- c("AL001","AL002","AL003")

DateTimeTag <- c(
  as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
  as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
  as.POSIXct('2014-09-29 10:38:18', tz='GMT')
)

logdata <- data.frame(ResearchNumber, DateTimeTag)


# loop through logdata to add variable to every observation with a server session number

linecount <- 1
for (lines in logdata) {
  if (linecount == 1) {
    session_number <- 1
    logdata$session_number <- session_number
    datetime <- logdata$DateTimeTag
    participantbefore <- logdata$ResearchNumber
    linecount <- (linecount + 1)
  } 
  else if (linecount > 1) {
    difference <- (logdata$DateTimeTag - datetime)
    if (logdata$ResearchNumber != participantbefore) {
      logdata$session_number <- (session_number + 1)
      participantbefore <- logdata$ResearchNumber
      session_number <- (session_number + 1)
      datetime <- logdata$DateTimeTag
    }
    else if (difference > 18000) {
      logdata$session_number <- (session_number + 1)
      participantbefore <- logdata$ResearchNumber
      session_number <- (session_number + 1)
      datetime <- logdata$DateTimeTag
    }
    else {
      logdata$session_number <- (session_number)
      participantbefore <- logdata$ResearchNumber
      datetime <- logdata$DateTimeTag
    }
  }
}

1 个答案:

答案 0 :(得分:0)

你击败了我@docendo discimus!

这是dplyr解决方案。

library(tidyverse) # brings in dplyr library

# make better example data
ResearchNumber <- c("AL001","AL002","AL003", "AL003", "AL003")

DateTimeTag <- c(
  as.POSIXct('2014-09-29 10:35:40', tz='GMT'),
  as.POSIXct('2014-09-29 10:35:42', tz='GMT'),
  as.POSIXct('2014-09-29 10:38:18', tz='GMT'),
  as.POSIXct('2014-09-29 12:00:00', tz='GMT'),
  as.POSIXct('2014-09-29 12:15:18', tz='GMT')
)

logdata <- data.frame(ResearchNumber, DateTimeTag)

logdata

logdata <- logdata %>% 
  arrange(ResearchNumber) %>% 
  group_by(ResearchNumber) %>% 
  mutate(difftime = difftime(DateTimeTag, lag(DateTimeTag), units = "mins"),
         DiffSess = case_when(
           is.na(difftime) ~ TRUE,
           difftime > 30 ~ TRUE,
           TRUE ~ FALSE)) %>% 
  ungroup() %>% 
  mutate(session_number = cumsum(DiffSess))

结果

  ResearchNumber DateTimeTag         session_number difftime  DiffSess
  <fct>          <dttm>                       <int> <drtn>    <lgl>   
1 AL001          2014-09-29 10:35:40              1   NA mins TRUE    
2 AL002          2014-09-29 10:35:42              2   NA mins TRUE    
3 AL003          2014-09-29 10:38:18              3   NA mins TRUE    
4 AL003          2014-09-29 12:00:00              4 81.7 mins TRUE    
5 AL003          2014-09-29 12:15:18              4 15.3 mins FALSE