窗口功能可根据逻辑条件在列上递增

时间:2020-05-17 10:10:44

标签: sql postgresql window-functions

我正在寻找一种根据以下条件添加用户不可知的“会话”列的方法。

1)每个用户定义自己的会话

2)时间流逝+10分钟,为给定用户启动新会话。

因此输入:

| user_id | datetime_col     |
------------------------------
|1        |  01/01/2020 13:00|
|1        |  01/01/2020 13:01|
|1        |  01/01/2020 13:02|
|1        |  01/01/2020 13:20|
|1        |  01/01/2020 13:21|
|1        |  01/01/2020 13:22|
|1        |  01/01/2020 13:23|
|2        |  01/01/2020 13:00|
|2        |  01/01/2020 13:01|
|2        |  01/01/2020 13:02|
|2        |  01/01/2020 13:03|
|2        |  01/01/2020 13:04|
|3        |  01/01/2020 13:00|
|3        |  01/01/2020 13:01|
|3        |  01/01/2020 13:02|
|3        |  01/01/2020 13:03|
|3        |  01/01/2020 13:04|

我想要以下输出:

| user_id | datetime_col     | seesion_id|
------------------------------------------
|1        |  01/01/2020 13:00|     0     |
|1        |  01/01/2020 13:01|     0     |
|1        |  01/01/2020 13:02|     0     |
|1        |  01/01/2020 13:20|     1     |
|1        |  01/01/2020 13:21|     1     |
|1        |  01/01/2020 13:22|     1     |
|1        |  01/01/2020 13:23|     1     |
|2        |  01/01/2020 13:00|     2     |
|2        |  01/01/2020 13:01|     2     |
|2        |  01/01/2020 13:02|     2     |
|2        |  01/01/2020 13:03|     2     |
|2        |  01/01/2020 13:04|     2     |
|3        |  01/01/2020 13:00|     3     |
|3        |  01/01/2020 13:01|     3     |
|3        |  01/01/2020 13:02|     3     |
|3        |  01/01/2020 13:03|     3     |
|3        |  01/01/2020 13:04|     3     |

从总体上讲,我想为每个用户计算每行之间的时间差,然后在user_id每次更改或时滞大于10分钟时递增。我可以做类似

SELECT *,
DATE_PART('minutes', datetime_col - LAG(datetime_col, 1) OVER (PARTITION BY user_id ORDER BY datetime_col)) AS grp
FROM Table1

并获取

| user_id | datetime_col     | grp       |
------------------------------------------
|1        |  01/01/2020 13:00|   (null)  |
|1        |  01/01/2020 13:01|     1     |
|1        |  01/01/2020 13:02|     1     |
|1        |  01/01/2020 13:20|    18     |
|1        |  01/01/2020 13:21|     1     |
|1        |  01/01/2020 13:22|     1     |
|1        |  01/01/2020 13:23|     1     |
|2        |  01/01/2020 13:00|   (null)  |
|2        |  01/01/2020 13:01|     1     |
|2        |  01/01/2020 13:02|     1     |
|2        |  01/01/2020 13:03|     1     |
|2        |  01/01/2020 13:04|     1     |
|3        |  01/01/2020 13:00|   (null)  |
|3        |  01/01/2020 13:01|     1     |
|3        |  01/01/2020 13:02|     1     |
|3        |  01/01/2020 13:03|     1     |
|3        |  01/01/2020 13:04|     1     |

但是从这里我很困惑,我该如何处理呢?

1 个答案:

答案 0 :(得分:0)

您可以使用lag()和累计金额为每个用户分别计算会话:

select t.*,
       count(*) filter (where prev_dt < datetime_col - interval '10 minute') over (partition by user_id order by datetime_col) as session_id
from (select t.*,
             lag(datetime_col) over (partition by user_id order by datetime_col) as prev_dt
      from table1 t
     ) t;

滞后获取前一个日期。然后,只有间隔相隔10分钟以上的过渡才会被计算为session_id

这对我来说最有意义。

但是您似乎想先累积USER_ID,然后再累积日期/时间(总体而言,按日期/时间对我来说更有意义)。这似乎很奇怪,但是您可以使用dense_rank()

with s as (
      select t.*,
             count(*) filter (where prev_dt < datetime_col - interval '10 minute') over (partition by user_id order by datetime_col) as session_id
      from (select t.*,
                   lag(datetime_col) over (partition by user_id order by datetime_col) as prev_dt
            from table1 t
           ) t
     )
select s.*,
        dense_rank() over (order by user_id, session_id) - 1 as new_session_id
from  s
order by user_id, datetime_col;

Here是db <>小提琴。