窗口函数查询,它认识到先前出现的分区值

时间:2016-01-26 07:01:11

标签: postgresql

我有一个用户的每日数据,如下所示:

the_date   | status
2015-12-01 | active
2015-12-02 | active
2015-12-03 | inactive
2015-12-04 | inactive
2015-12-05 | inactive
2015-12-06 | active
2015-12-07 | active

我想添加一个新列days_in_current_status,它总结了此用户处于当前状态的天数,但将两个独立的“活动”链视为两种不同的状态,以便结果看起来像这样:

the_date   | status   | days_in_current_status
2015-12-01 | active   | 1
2015-12-02 | active   | 2
2015-12-03 | inactive | 1
2015-12-04 | inactive | 2
2015-12-05 | inactive | 3
2015-12-06 | active   | 1
2015-12-07 | active   | 2

我怎么能这样做?

SELECT ROW_NUMBER() OVER (PARTITION BY status ORDER BY the_date)对我来说是不够的,因为它将2015-12-06行标记为4,而将其标记为5.如果我可以添加重写最后2列的列status packet.transition('position', {x: linkInfo.end.y, y: linkInfo.end.y}, { delay: animationDelay, duration: animationDuration }); 1}}值为'active2'。

2 个答案:

答案 0 :(得分:1)

如果您确实有每日值,那么您可以使用递归CTE:

WITH RECURSIVE stat (the_date, status, days_in_current_status) AS (
  SELECT min(the_date), status, 1
  FROM mytable
  GROUP BY status
  UNION
  SELECT t.the_date, t.status,
         CASE WHEN t.status = s.status THEN s.days_in_current_status + 1 ELSE 1 END
  FROM mytable t
  JOIN stat s ON s.the_date + 1 = t.the_date
)
SELECT * FROM stat
ORDER BY the_date;

结果:

psql (9.5.0)
Type "help" for help.

test=# WITH RECURSIVE stat (the_date, status, days_in_current_status) AS (
test(#   SELECT min(the_date), status, 1
test(#   FROM mytable
test(#   GROUP BY status
test(#   UNION
test(#   SELECT t.the_date, t.status,
test(#          CASE WHEN t.status = s.status THEN s.days_in_current_status + 1 ELSE 1 END
test(#   FROM mytable t
test(#   JOIN stat s ON s.the_date + 1 = t.the_date
test(# )
test-# SELECT * FROM stat
test-# ORDER BY the_date;
  the_date  |  status  | days_in_current_status
------------+----------+------------------------
 2015-12-01 | active   |                      1
 2015-12-02 | active   |                      2
 2015-12-03 | inactive |                      1
 2015-12-04 | inactive |                      2
 2015-12-05 | inactive |                      3
 2015-12-06 | active   |                      1
 2015-12-07 | active   |                      2
(7 rows)

如果您的日期有差距,可以在感兴趣的日期范围内generate_series()

答案 1 :(得分:0)

想出来。首先,将昨天的状态添加到表中,以便能够与今天的状态进行比较:

SELECT
*, LAG(status) OVER (PARTITION BY 1 ORDER BY the_date) AS status_yesterday
FROM table

the_date   | status  | status_yesterday
2015-12-01 | active  | NULL
2015-12-02 | active  | active
2015-12-03 | inactive| active
2015-12-04 | inactive| inactive
2015-12-05 | inactive| inactive
2015-12-06 | active  | inactive
2015-12-07 | active  | active

然后计算它们不相同的情况:

SELECT *, CASE WHEN status = status_yesterday THEN 1 ELSE 0 END AS transition
FROM table

the_date   | status  | status_yesterday| transition
2015-12-01 | active  | NULL            | 1
2015-12-02 | active  | active          | 0
2015-12-03 | inactive| active          | 1
2015-12-04 | inactive| inactive        | 0
2015-12-05 | inactive| inactive        | 0
2015-12-06 | active  | inactive        | 1
2015-12-07 | active  | active          | 0

然后对转换进行求和以构造status_id:

SELECT *, SUM(transition) OVER (PARTITION BY status ORDER BY the_date)
FROM table

the_date   | status  | status_yesterday| transition | status_id
2015-12-01 | active  | NULL            | 1          | 1
2015-12-02 | active  | active          | 0          | 1
2015-12-03 | inactive| active          | 1          | 1
2015-12-04 | inactive| inactive        | 0          | 1
2015-12-05 | inactive| inactive        | 0          | 1
2015-12-06 | active  | inactive        | 1          | 2
2015-12-07 | active  | active          | 0          | 2

现在,statusstatus_id的串联可用于识别要分区的不同链。

合在一起:

WITH add_yesterday_status AS (
    SELECT the_date
    , status
    , LAG(status) OVER (
        PARTITION BY 1
        ORDER BY the_date
      ) AS status_yesterday
    FROM table
)

, add_transition AS (
    SELECT *
    , CASE WHEN status = status_yesterday THEN 1 ELSE 0 END AS transition
    FROM add_yesterday_status
)

, add_status_id AS (
    SELECT *
    , SUM(transition) OVER (
        PARTITION BY status
        ORDER BY the_date
      ) AS status_id
    FROM add_transition
)

, add_days_in_current_status AS (
    SELECT *
    , ROW_NUMBER() OVER (
        PARTITION BY status, status_id
        ORDER BY the_date
      ) AS days_in_current_status
)

SELECT the_date
, status
, days_in_current_status
FROM add_days_in_current_status