我想确定注册用户ID的时间戳之间的区别。在这里,我只想测量具有登录和注销状态的用户之间的差异。有些用户只能注销我们的登录状态。对于他们,我只想将dem标记为NA
:
一些数据:
library(dplyr)
start <- as.POSIXct("2012-01-15")
interval <- 70
end <- start + as.difftime(1, units="days")
tseq<- seq(from=start, by=interval*70, to=end)
employeID <-c("1_e","1_e","2_b","2_b","3_c","3_c","100_c","4_d","4_d","52_f","9_f","9_f","7_u","7_u","10_5","22_2","33_a","33_a")
status<- c("login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","logout","login","logout","login")
# put together
data <- data.frame(tseq, employeID, status)
tseq employeID status
#1 2012-01-15 00:00:00 1_e login
#2 2012-01-15 01:21:40 1_e logout
#3 2012-01-15 02:43:20 2_b login
#4 2012-01-15 04:05:00 2_b logout
#5 2012-01-15 05:26:40 3_c login
#6 2012-01-15 06:48:20 3_c logout
#7 2012-01-15 08:10:00 100_c login
#8 2012-01-15 09:31:40 4_d logout
#9 2012-01-15 10:53:20 4_d login
#10 2012-01-15 12:15:00 52_f logout
#11 2012-01-15 13:36:40 9_f login
#12 2012-01-15 14:58:20 9_f logout
#13 2012-01-15 16:20:00 7_u login
#14 2012-01-15 17:41:40 7_u logout
#15 2012-01-15 19:03:20 10_5 logout
#16 2012-01-15 20:25:00 22_2 login
#17 2012-01-15 21:46:40 33_a logout
#18 2012-01-15 23:08:20 33_a login
test<- data %>%
group_by(employeID) %>%
mutate(time.difference = tseq - lag(tseq))
然而,这似乎只产生time.difference常量
答案 0 :(得分:2)
这个怎么样?主要是,当您需要mutate
时,您似乎正在使用summarise
。此外,我已将status
列从因子转换为字符,并包含ifelse
语句,仅使用户同时使用“登录”和“注销”条目:
test <- data %>%
mutate( status = as.character( status ) ) %>%
group_by( employeID ) %>%
summarise( time.difference = ifelse( "login" %in% status && "logout" %in% status,
difftime( tseq[ status == "logout" ], tseq[ status == "login" ] ),
NA )
)
给出了:
> head( test )
# A tibble: 6 × 2
employeID time.difference
<fctr> <dbl>
1 1_e 1.361111
2 10_5 NA
3 100_c NA
4 2_b 1.361111
5 22_2 NA
6 3_c 1.361111
正如其他人所建议的那样,您的数据确实包含恒定的时间间隔,因此只要有相关值,它就会始终相同。我假设你的实际数据看起来有点不同,所以你会得到更多的感性输出。
答案 1 :(得分:1)
我们首先通过检查每个组的计数来筛选具有未配对状态的组。然后我们dplyr::do
计算每组的时差
library(dplyr)
start <- as.POSIXct("2012-01-15")
interval <- 70
end <- start + as.difftime(1, units="days")
tseq<- seq(from=start, by=interval*70, to=end)
employeID <-c("1_e","1_e","2_b","2_b","3_c","3_c","100_c","4_d","4_d","52_f","9_f","9_f","7_u","7_u","10_5","22_2","33_a","33_a")
status<- c("login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","logout","login","logout","login")
# put together
DF <- data.frame(tseq, employeID, status)
tseq employeID status
#1 2012-01-15 00:00:00 1_e login
#2 2012-01-15 01:21:40 1_e logout
#3 2012-01-15 02:43:20 2_b login
#4 2012-01-15 04:05:00 2_b logout
#5 2012-01-15 05:26:40 3_c login
#6 2012-01-15 06:48:20 3_c logout
#7 2012-01-15 08:10:00 100_c login
#8 2012-01-15 09:31:40 4_d logout
#9 2012-01-15 10:53:20 4_d login
#10 2012-01-15 12:15:00 52_f logout
#11 2012-01-15 13:36:40 9_f login
#12 2012-01-15 14:58:20 9_f logout
#13 2012-01-15 16:20:00 7_u login
#14 2012-01-15 17:41:40 7_u logout
#15 2012-01-15 19:03:20 10_5 logout
#16 2012-01-15 20:25:00 22_2 login
#17 2012-01-15 21:46:40 33_a logout
#18 2012-01-15 23:08:20 33_a login
testDF<- DF %>%
dplyr::group_by(employeID) %>%
dplyr::filter(count(unique(status)) > 1 ) %>%
dplyr::do(.,data.frame(logINTime =.$tseq[.$status=="login"],logOUTTime =.$tseq[.$status=="logout"],
deltaTime=difftime(.$tseq[.$status=="logout"],.$tseq[.$status=="login"],units="secs"))) %>%
as.data.frame()
testDF
# employeID logINTime logOUTTime deltaTime
# 1 1_e 2012-01-15 00:00:00 2012-01-15 01:21:40 4900
# 2 2_b 2012-01-15 02:43:20 2012-01-15 04:05:00 4900
# 3 3_c 2012-01-15 05:26:40 2012-01-15 06:48:20 4900
# 4 33_a 2012-01-15 23:08:20 2012-01-15 21:46:40 -4900
# 5 4_d 2012-01-15 10:53:20 2012-01-15 09:31:40 -4900
# 6 7_u 2012-01-15 16:20:00 2012-01-15 17:41:40 4900
# 7 9_f 2012-01-15 13:36:40 2012-01-15 14:58:20 4900
答案 2 :(得分:0)
此行似乎创建了一个恒定的时间间隔:
tseq<- seq(from=start, by=interval*70, to=end)
所以,当你再次采取差异时,它不会是不变的吗?