假设我有一个数据表
date | user_id | user_last_name | order_id | is_new_session
------------+------------+----------------+-----------+---------------
2014-09-01 | A | B | 1 | t
2014-09-01 | A | B | 5 | f
2014-09-02 | A | B | 8 | t
2014-09-01 | B | B | 2 | t
2014-09-02 | B | test | 3 | t
2014-09-03 | B | test | 4 | t
2014-09-04 | B | test | 6 | t
2014-09-04 | B | test | 7 | f
2014-09-05 | B | test | 9 | t
2014-09-05 | B | test | 10 | f
我想在Redshift中获得另一个列,它基本上为每个用户会话分配会话号。它从每个用户的第一条记录开始,当你向下移动时,如果它在" is_new_session"中遇到真值。列,它递增。如果遇到错误则保持不变。如果它命中新用户,则该值将重置为1.此表的理想输出为:
1
1
2
1
2
3
4
4
5
5
在我看来,它与SUM(1) over (Partition BY user_id, is_new_session ORDER BY user_id, date ASC)
有什么想法吗?
谢谢!
答案 0 :(得分:2)
我认为你想要一个增量总和:
select t.*,
sum(case when is_new_session then 1 else 0 end) over (partition by user_id order by date) as session_number
from t;
在Redshift中,您可能需要窗口条款:
select t.*,
sum(case when is_new_session then 1 else 0 end) over
(partition by user_id
order by date
rows between unbounded preceding and current row
) as session_number
from t;