如何获取特定值序列内的行?

时间:2018-09-21 07:59:43

标签: sql amazon-redshift redash

我需要获得一些会话,在这些会话中,我需要按特定顺序排列一个值序列。

现在我有了这个查询,该查询从原始数据返回每个用户的会话

select user_id, page, happened_at 
from db as u1 
where  exists 
  (select 1 from db as u2 
   where u1.user_id = u2.user_id 
   and u2.happened_at < (u1.happened_at + interval '1 hour') )
ORDER BY user_id, happened_at
LIMIT 100

我需要获取格式的结果集,该格式是通过上述请求获得的会话组上的子查询的结果。

子查询条件是获得一个会话,在该会话中用户以特定顺序打开页面。例如。页面值可以是-开始,工作,购买,登陆-用户在会话期间应按此顺序浏览每个页面。他/她仍然可以在彼此之间遇到任何其他页面,但是在会话期间必须按此顺序浏览所有这些页面。

如何获得满足这种条件的输出?如何在会话期间了解页面值按特定顺序的变化?

第一次运行查询时的数据集:

    user_id     page    happened_at     
3,230 start 2017-03-01 15:10
3,230 work 2017-03-01 15:16
3,230 start 2017-03-01 15:16 
3,230 preview  2017-03-01 17:12
3,230 work 2017-03-01 17:12
3,230 buying 2017-03-01 17:13
3,230 landing 2017-03-01 17:51
3,230 smt else 2017-03-01 17:52
3,230 any page 2017-03-01 17:56
3,230 lanidng 2017-03-01 18:03

输出(我现在正在查看的内容)

    user_id     page    happened_at     
3,230 start 2017-03-01 15:16 
3,230 preview  2017-03-01 17:12
3,230 work 2017-03-01 17:12
3,230 buying 2017-03-01 17:13
3,230 landing 2017-03-01 17:51

最终结果

user_id session_start session_end
3,230   2017-03-01 15:16 2017-03-01 18:03

1 个答案:

答案 0 :(得分:0)

使用窗口功能进行第一输出

select user_id,page,happened_at from 
(
select user_id,page,happened_at, row_number() over(partition by user_id,page order by happened_at desc) rn from table
) t where rn=1

然后使用此查询输出的max()和min()

    with t1 as 

     (
    select user_id,page,happened_at from 
        (
        select user_id,page,happened_at, row_number() over(partition by user_id,page order by happened_at desc) rn from table
        ) t where rn=1
    ) select user_id,min(happened_at) session_start,
max(happened_at) as session_end from t1 group by user_id