如何计算某些条件下的时间戳之间的差异?

时间:2016-11-03 08:29:05

标签: r

我想确定注册用户ID的时间戳之间的区别。在这里,我只想测量具有登录和注销状态的用户之间的差异。有些用户只能注销我们的登录状态。对于他们,我只想将dem标记为NA

一些数据:

  library(dplyr)
  start <- as.POSIXct("2012-01-15")
  interval <- 70
  end <- start + as.difftime(1, units="days")
  tseq<- seq(from=start, by=interval*70, to=end)
  employeID <-c("1_e","1_e","2_b","2_b","3_c","3_c","100_c","4_d","4_d","52_f","9_f","9_f","7_u","7_u","10_5","22_2","33_a","33_a")
  status<- c("login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","logout","login","logout","login")
  # put together
  data <- data.frame(tseq, employeID, status)

           tseq            employeID   status
  #1  2012-01-15 00:00:00       1_e  login
  #2  2012-01-15 01:21:40       1_e logout
  #3  2012-01-15 02:43:20       2_b  login
  #4  2012-01-15 04:05:00       2_b logout
  #5  2012-01-15 05:26:40       3_c  login
  #6  2012-01-15 06:48:20       3_c logout
  #7  2012-01-15 08:10:00     100_c  login
  #8  2012-01-15 09:31:40       4_d logout
  #9  2012-01-15 10:53:20       4_d  login
  #10 2012-01-15 12:15:00      52_f logout
  #11 2012-01-15 13:36:40       9_f  login
  #12 2012-01-15 14:58:20       9_f logout
  #13 2012-01-15 16:20:00       7_u  login
  #14 2012-01-15 17:41:40       7_u logout
  #15 2012-01-15 19:03:20      10_5 logout
  #16 2012-01-15 20:25:00      22_2  login
  #17 2012-01-15 21:46:40      33_a logout
  #18 2012-01-15 23:08:20      33_a  login  


  test<- data %>% 
    group_by(employeID) %>% 
    mutate(time.difference = tseq - lag(tseq))

然而,这似乎只产生time.difference常量

3 个答案:

答案 0 :(得分:2)

这个怎么样?主要是,当您需要mutate时,您似乎正在使用summarise。此外,我已将status列从因子转换为字符,并包含ifelse语句,仅使用户同时使用“登录”和“注销”条目:

test <- data %>% 
    mutate( status = as.character( status ) ) %>%
    group_by( employeID ) %>% 
    summarise( time.difference = ifelse( "login" %in% status && "logout" %in% status, 
                                         difftime( tseq[ status == "logout" ], tseq[ status == "login" ] ), 
                                         NA ) 
    )

给出了:

> head( test )
# A tibble: 6 × 2
employeID time.difference
      <fctr>           <dbl>
1       1_e        1.361111
2      10_5              NA
3     100_c              NA
4       2_b        1.361111
5      22_2              NA
6       3_c        1.361111

正如其他人所建议的那样,您的数据确实包含恒定的时间间隔,因此只要有相关值,它就会始终相同。我假设你的实际数据看起来有点不同,所以你会得到更多的感性输出。

答案 1 :(得分:1)

我们首先通过检查每个组的计数来筛选具有未配对状态的组。然后我们dplyr::do 计算每组的时差

 library(dplyr)
  start <- as.POSIXct("2012-01-15")
  interval <- 70
  end <- start + as.difftime(1, units="days")
  tseq<- seq(from=start, by=interval*70, to=end)
  employeID <-c("1_e","1_e","2_b","2_b","3_c","3_c","100_c","4_d","4_d","52_f","9_f","9_f","7_u","7_u","10_5","22_2","33_a","33_a")
  status<- c("login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","logout","login","logout","login")
  # put together
  DF <- data.frame(tseq, employeID, status)

           tseq            employeID   status
  #1  2012-01-15 00:00:00       1_e  login
  #2  2012-01-15 01:21:40       1_e logout
  #3  2012-01-15 02:43:20       2_b  login
  #4  2012-01-15 04:05:00       2_b logout
  #5  2012-01-15 05:26:40       3_c  login
  #6  2012-01-15 06:48:20       3_c logout
  #7  2012-01-15 08:10:00     100_c  login
  #8  2012-01-15 09:31:40       4_d logout
  #9  2012-01-15 10:53:20       4_d  login
  #10 2012-01-15 12:15:00      52_f logout
  #11 2012-01-15 13:36:40       9_f  login
  #12 2012-01-15 14:58:20       9_f logout
  #13 2012-01-15 16:20:00       7_u  login
  #14 2012-01-15 17:41:40       7_u logout
  #15 2012-01-15 19:03:20      10_5 logout
  #16 2012-01-15 20:25:00      22_2  login
  #17 2012-01-15 21:46:40      33_a logout
  #18 2012-01-15 23:08:20      33_a  login  


  testDF<- DF %>% 
    dplyr::group_by(employeID) %>%
    dplyr::filter(count(unique(status)) > 1 ) %>% 
    dplyr::do(.,data.frame(logINTime =.$tseq[.$status=="login"],logOUTTime =.$tseq[.$status=="logout"],
    deltaTime=difftime(.$tseq[.$status=="logout"],.$tseq[.$status=="login"],units="secs"))) %>%
    as.data.frame()


testDF
  # employeID           logINTime          logOUTTime deltaTime
# 1       1_e 2012-01-15 00:00:00 2012-01-15 01:21:40      4900
# 2       2_b 2012-01-15 02:43:20 2012-01-15 04:05:00      4900
# 3       3_c 2012-01-15 05:26:40 2012-01-15 06:48:20      4900
# 4      33_a 2012-01-15 23:08:20 2012-01-15 21:46:40     -4900
# 5       4_d 2012-01-15 10:53:20 2012-01-15 09:31:40     -4900
# 6       7_u 2012-01-15 16:20:00 2012-01-15 17:41:40      4900
# 7       9_f 2012-01-15 13:36:40 2012-01-15 14:58:20      4900

答案 2 :(得分:0)

此行似乎创建了一个恒定的时间间隔:

tseq<- seq(from=start, by=interval*70, to=end)

所以,当你再次采取差异时,它不会是不变的吗?