我正在寻找一种根据以下条件添加用户不可知的“会话”列的方法。
1)每个用户定义自己的会话
2)时间流逝+10分钟,为给定用户启动新会话。
因此输入:
| user_id | datetime_col |
------------------------------
|1 | 01/01/2020 13:00|
|1 | 01/01/2020 13:01|
|1 | 01/01/2020 13:02|
|1 | 01/01/2020 13:20|
|1 | 01/01/2020 13:21|
|1 | 01/01/2020 13:22|
|1 | 01/01/2020 13:23|
|2 | 01/01/2020 13:00|
|2 | 01/01/2020 13:01|
|2 | 01/01/2020 13:02|
|2 | 01/01/2020 13:03|
|2 | 01/01/2020 13:04|
|3 | 01/01/2020 13:00|
|3 | 01/01/2020 13:01|
|3 | 01/01/2020 13:02|
|3 | 01/01/2020 13:03|
|3 | 01/01/2020 13:04|
我想要以下输出:
| user_id | datetime_col | seesion_id|
------------------------------------------
|1 | 01/01/2020 13:00| 0 |
|1 | 01/01/2020 13:01| 0 |
|1 | 01/01/2020 13:02| 0 |
|1 | 01/01/2020 13:20| 1 |
|1 | 01/01/2020 13:21| 1 |
|1 | 01/01/2020 13:22| 1 |
|1 | 01/01/2020 13:23| 1 |
|2 | 01/01/2020 13:00| 2 |
|2 | 01/01/2020 13:01| 2 |
|2 | 01/01/2020 13:02| 2 |
|2 | 01/01/2020 13:03| 2 |
|2 | 01/01/2020 13:04| 2 |
|3 | 01/01/2020 13:00| 3 |
|3 | 01/01/2020 13:01| 3 |
|3 | 01/01/2020 13:02| 3 |
|3 | 01/01/2020 13:03| 3 |
|3 | 01/01/2020 13:04| 3 |
从总体上讲,我想为每个用户计算每行之间的时间差,然后在user_id每次更改或时滞大于10分钟时递增。我可以做类似
SELECT *,
DATE_PART('minutes', datetime_col - LAG(datetime_col, 1) OVER (PARTITION BY user_id ORDER BY datetime_col)) AS grp
FROM Table1
并获取
| user_id | datetime_col | grp |
------------------------------------------
|1 | 01/01/2020 13:00| (null) |
|1 | 01/01/2020 13:01| 1 |
|1 | 01/01/2020 13:02| 1 |
|1 | 01/01/2020 13:20| 18 |
|1 | 01/01/2020 13:21| 1 |
|1 | 01/01/2020 13:22| 1 |
|1 | 01/01/2020 13:23| 1 |
|2 | 01/01/2020 13:00| (null) |
|2 | 01/01/2020 13:01| 1 |
|2 | 01/01/2020 13:02| 1 |
|2 | 01/01/2020 13:03| 1 |
|2 | 01/01/2020 13:04| 1 |
|3 | 01/01/2020 13:00| (null) |
|3 | 01/01/2020 13:01| 1 |
|3 | 01/01/2020 13:02| 1 |
|3 | 01/01/2020 13:03| 1 |
|3 | 01/01/2020 13:04| 1 |
但是从这里我很困惑,我该如何处理呢?
答案 0 :(得分:0)
您可以使用lag()
和累计金额为每个用户分别计算会话:
select t.*,
count(*) filter (where prev_dt < datetime_col - interval '10 minute') over (partition by user_id order by datetime_col) as session_id
from (select t.*,
lag(datetime_col) over (partition by user_id order by datetime_col) as prev_dt
from table1 t
) t;
滞后获取前一个日期。然后,只有间隔相隔10分钟以上的过渡才会被计算为session_id
。
这对我来说最有意义。
但是您似乎想先累积USER_ID
,然后再累积日期/时间(总体而言,按日期/时间对我来说更有意义)。这似乎很奇怪,但是您可以使用dense_rank()
:
with s as (
select t.*,
count(*) filter (where prev_dt < datetime_col - interval '10 minute') over (partition by user_id order by datetime_col) as session_id
from (select t.*,
lag(datetime_col) over (partition by user_id order by datetime_col) as prev_dt
from table1 t
) t
)
select s.*,
dense_rank() over (order by user_id, session_id) - 1 as new_session_id
from s
order by user_id, datetime_col;
Here是db <>小提琴。