我想计算应用程序使用的会话时长。但是,在提供的日志中,我可以获得的唯一相关信息是时间戳。下面是单个用户的简化日志。
record_num, user_id, record_ts
-----------------------------
1, uid_1, 12:01am
2, uid_1, 12:02am
3, uid_1, 12:03am
4, uid_1, 12:22am
5, uid_1, 12:22am
6, uid_1, 12:25am
假设一个会话在闲置15分钟后结束,则上面的日志将包含2个会话。现在,我想计算两个会话的平均时长。
我可以通过首先计算每条记录之间的时间差来得出会话数,并且每当差异超过15分钟时,就会计算一次会话。
但是要导出持续时间,因为我需要知道每个会话的min(record_ts)和max(record_ts)。但是,如果没有某种session_id,我就无法将记录分组到关联的会话中。
有没有可以解决此问题的基于SQL的方法?
答案 0 :(得分:1)
我将按照以下步骤进行操作:
lag()
和一些逻辑来确定会话何时开始。因此,要获取有关每个会话的信息:
select user_id, session, min(record_ts), max(record_ts),
timestamp_diff(max(record_ts), min(record_ts), second) as dur_seconds
from (select l.*,
countif( record_ts > timestamp_add(prev_record_ts, interval 15 minute) ) as session
from (select l.*,
lag(record_ts, 1, record_ts) over (partition by user_id order by record_ts) as prev_record_ts
from log l
) l
group by record_num, user_id;
平均值是又一步:
with s as (
select user_id, session, min(record_ts), max(record_ts),
timestamp_diff(max(record_ts), min(record_ts), second) as dur_seconds
from (select l.*,
countif( record_ts > timestamp_add(prev_record_ts, interval 15 minute) ) as session
from (select l.*,
lag(record_ts, 1, record_ts) over (partition by user_id order by record_ts) as prev_record_ts
from log l
) l
group by record_num, user_id
)
select user_id, avg(dur_seconds)
from s
group b user_id;
答案 1 :(得分:0)
假设您也有日期(不表示要计算会话的结束时间是否在开始时间之前开始),则可以执行以下操作:
WITH CTE AS
(SELECT * FROM
(SELECT 1 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:01:00') record_ts)
UNION ALL
(SELECT 2 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:02:00') record_ts)
UNION ALL
(SELECT 3 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:03:00') record_ts)
UNION ALL
(SELECT 4 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:22:00') record_ts)
UNION ALL
(SELECT 5 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:22:00') record_ts)
UNION ALL
(SELECT 6 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:25:00') record_ts)
UNION ALL
(SELECT 7 record_num, "uid_1" user_id, TIMESTAMP('2018-10-01 12:59:00') record_ts)),
sessions as
(SELECT
if(timestamp_diff(record_ts,lag(record_ts,1) OVER (PARTITION BY user_id ORDER BY
record_ts, record_num),MINUTE) >= 15 OR
lag(record_ts,1) OVER (PARTITION BY user_id ORDER BY record_ts, record_num) IS NULL,1,0)
session, record_num, user_id, record_ts
FROM CTE)
SELECT sum(session) OVER (PARTITION BY user_id ORDER BY record_ts, record_num)
sessionNo, record_num, user_id, record_ts
FROM sessions
关键是会话之间需要的分钟数。在上述情况下,我将其设置为15分钟(> = 15)。显然,将会话号与user_Id和会话开始时间连接起来以创建唯一的会话标识符可能很有用。