Postgres:对于每一行,在条件下评估所有连续的行

时间:2019-03-11 17:31:29

标签: postgresql window-functions

我有这张桌子:

id | datetime            | row_number 

1    2018-04-09 06:27:00   1
1    2018-04-09 14:15:00   2 
1    2018-04-09 15:25:00   3
1    2018-04-09 15:35:00   4
1    2018-04-09 15:51:00   5
1    2018-04-09 17:05:00   6
1    2018-04-10 06:42:00   7 
1    2018-04-10 16:39:00   8 
1    2018-04-10 18:58:00   9
1    2018-04-10 19:41:00   10
1    2018-04-14 17:05:00   11
1    2018-04-14 17:48:00   12 
1    2018-04-14 18:57:00   13

我将为每一行计算时间<= '01:30:00'的连续行,并从不满足条件的第一行开始进行连续评估。

我试图更好地说明这个问题。 使用Windows函数lag():

 SELECT id, datetime, 
        CASE WHEN datetime - lag (datetime,1)  OVER(PARTITION BY id ORDER BY datetime)   
        < '01:30:00' THEN 1 ELSE 0 END AS count
        FROM table

结果是:

id | datetime            | count 

1    2018-04-09 06:27:00   0
1    2018-04-09 14:15:00   0 
1    2018-04-09 15:25:00   1
1    2018-04-09 15:35:00   1
1    2018-04-09 15:51:00   1
1    2018-04-09 17:05:00   1
1    2018-04-10 06:42:00   0 
1    2018-04-10 16:39:00   0 
1    2018-04-10 18:58:00   0
1    2018-04-10 19:41:00   1
1    2018-04-14 17:05:00   0
1    2018-04-14 17:48:00   1 
1    2018-04-14 18:57:00   1

但是对我来说,这不是一件好事,因为我要排除row_number 5,因为row_number 5和row_number 2之间的间隔为'01:30:00'。并从row_number 5开始新的评估。 row_number 13相同。

正确的输出可能是:

id | datetime            | count 

1    2018-04-09 06:27:00   0
1    2018-04-09 14:15:00   0 
1    2018-04-09 15:25:00   1
1    2018-04-09 15:35:00   1
1    2018-04-09 15:51:00   0
1    2018-04-09 17:05:00   1
1    2018-04-10 06:42:00   0 
1    2018-04-10 16:39:00   0 
1    2018-04-10 18:58:00   0
1    2018-04-10 19:41:00   1
1    2018-04-14 17:05:00   0
1    2018-04-14 17:48:00   1 
1    2018-04-14 18:57:00   0

所以正确的计数是5。

1 个答案:

答案 0 :(得分:1)

我将为此使用递归查询:

WITH RECURSIVE tmp AS (
    SELECT
        id,
        datetime,
        row_number,
        0 AS counting,
        datetime AS last_start
    FROM mytable
    WHERE row_number = 1
    UNION ALL
    SELECT
        t1.id,
        t1.datetime,
        t1.row_number,
        CASE
            WHEN lateral_1.counting THEN 1
            ELSE 0
        END AS counting,
        CASE
            WHEN lateral_1.counting THEN tmp.last_start
            ELSE t1.datetime
        END AS last_start
    FROM
        mytable AS t1
    INNER JOIN
        tmp ON (t1.id = tmp.id AND t1.row_number - 1 = tmp.row_number),
    LATERAL (SELECT (t1.datetime - tmp.last_start) < '1h 30m'::interval AS counting) AS lateral_1
)
SELECT id, datetime, counting
FROM tmp
ORDER BY id, datetime;