我有一系列时间戳代表用户在网站上的活动。我想将这些时间戳区分为会话(定义为相隔不到一小时的时间戳),计算每个会话的长度以及会话之间的差距。
示例数据集如下所示:
有没有办法在sas或R中循环这一系列时间戳,以便我可以计算会话长度(例如:01JUL14中的23:00会话)并计算会话之间的差距(7月1日之间的时间间隔)和7月9日)
谢谢!
答案 0 :(得分:1)
# reproducible input data
dta <- data.frame(time = as.POSIXlt(c("2006-10-21 18:47:22",
"2006-10-21 18:57:58",
"2006-10-21 19:59:05",
"2006-10-21 20:05:05",
"2006-10-21 20:06:05",
"2006-10-21 20:07:05",
"2006-10-21 22:04:05",
"2006-10-21 22:05:05")))
# see which timestamps are the start/stop of a session.
# Hope that meets your definition of (inactivity less than) 1 hr for one session
dta$s.start <- c(TRUE, diff(dta$time) > 60) # TRUE = start of new session, 60 min as max duration of a session
dta$s.stop <- c(dta$s.start[2:length(dta$s.start)], TRUE) # TRUE = stop of this session
# indices of the timestamps that mar a session
sessions <- data.frame(
s.1 = which(dta$s.start), # starts
s.2 = which(dta$s.stop)) # stops
# duration and gaps
(durations <- dta$time[sessions$s.2] - dta$time[sessions$s.1])
(gaps <- dta$time[sessions$s.1[2:length(sessions$s.1)]] - dta$time[sessions$s.2[1:length(sessions$s.2)-1]])