我有一个看起来像这样的表:
user_id page happened_at
2 'page3' 2017-10-05 11:31
1 'page2' 2016-02-01 00:02
2 'page1' 2017-10-05 15:24
3 'page3' 2017-03-31 19:35
4 'page1' 2017-07-09 00:24
2 'page3' 2017-10-05 15:28
1 'page3' 2018-02-01 13:02
2 'page2' 2017-10-05 16:14
2 'page3' 2017-10-05 16:34
etc
我有一个查询用于识别用户会话,这些用户会话按照特定顺序打开的页面#1,#2和#3,在彼此不到一小时的时间段内完成(第2页的一小时内的第3页, page1内一小时内的第2页。可以忽略在它之间打开的任何页面。上表中的会话示例:
user_id page happened_at
2 'page1' 2017-10-05 15:24
2 'page2' 2017-10-05 16:14
2 'page3' 2017-10-05 16:34
到目前为止,我的查询如下所示,并显示了具有会话的用户的user_id:
select user_id
from (select user_id,page,happened_at,
lag(page) over(partition by user_id order by happened_at) as prev_page,
lead(page) over(partition by user_id order by happened_at) as next_page,
datediff(minute,lag(happened_at) over(partition by user_id order by happened_at),happened_at) as time_diff_with_prev_action,
datediff(minute,happened_at,lead(happened_at) over(partition by user_id order by happened_at)) as time_diff_with_next_action
from tbl
) t
where page='page2' and prev_page='page1' and next_page='page3'
and time_diff_with_prev_action <= 60 and time_diff_with_next_action <= 60
我需要的是编辑查询,在输出中添加2列,会话开始时间和会话结束时间,这是最后一次操作+ 1小时。请告知如何制作它。临时表是被禁止的,所以它应该只是一个查询。示例输出应为:
user_id session_start session_end
2 2017-10-05 15:24 2017-10-05 17:34
谢谢你的时间!