我有一个用户的每日数据,如下所示:
the_date | status
2015-12-01 | active
2015-12-02 | active
2015-12-03 | inactive
2015-12-04 | inactive
2015-12-05 | inactive
2015-12-06 | active
2015-12-07 | active
我想添加一个新列days_in_current_status
,它总结了此用户处于当前状态的天数,但将两个独立的“活动”链视为两种不同的状态,以便结果看起来像这样:
the_date | status | days_in_current_status
2015-12-01 | active | 1
2015-12-02 | active | 2
2015-12-03 | inactive | 1
2015-12-04 | inactive | 2
2015-12-05 | inactive | 3
2015-12-06 | active | 1
2015-12-07 | active | 2
我怎么能这样做?
SELECT ROW_NUMBER() OVER (PARTITION BY status ORDER BY the_date)
对我来说是不够的,因为它将2015-12-06行标记为4,而将其标记为5.如果我可以添加重写最后2列的列status
packet.transition('position', {x: linkInfo.end.y, y: linkInfo.end.y}, {
delay: animationDelay,
duration: animationDuration
});
1}}值为'active2'。
答案 0 :(得分:1)
如果您确实有每日值,那么您可以使用递归CTE:
WITH RECURSIVE stat (the_date, status, days_in_current_status) AS (
SELECT min(the_date), status, 1
FROM mytable
GROUP BY status
UNION
SELECT t.the_date, t.status,
CASE WHEN t.status = s.status THEN s.days_in_current_status + 1 ELSE 1 END
FROM mytable t
JOIN stat s ON s.the_date + 1 = t.the_date
)
SELECT * FROM stat
ORDER BY the_date;
结果:
psql (9.5.0)
Type "help" for help.
test=# WITH RECURSIVE stat (the_date, status, days_in_current_status) AS (
test(# SELECT min(the_date), status, 1
test(# FROM mytable
test(# GROUP BY status
test(# UNION
test(# SELECT t.the_date, t.status,
test(# CASE WHEN t.status = s.status THEN s.days_in_current_status + 1 ELSE 1 END
test(# FROM mytable t
test(# JOIN stat s ON s.the_date + 1 = t.the_date
test(# )
test-# SELECT * FROM stat
test-# ORDER BY the_date;
the_date | status | days_in_current_status
------------+----------+------------------------
2015-12-01 | active | 1
2015-12-02 | active | 2
2015-12-03 | inactive | 1
2015-12-04 | inactive | 2
2015-12-05 | inactive | 3
2015-12-06 | active | 1
2015-12-07 | active | 2
(7 rows)
如果您的日期有差距,可以在感兴趣的日期范围内generate_series()
答案 1 :(得分:0)
想出来。首先,将昨天的状态添加到表中,以便能够与今天的状态进行比较:
SELECT
*, LAG(status) OVER (PARTITION BY 1 ORDER BY the_date) AS status_yesterday
FROM table
the_date | status | status_yesterday
2015-12-01 | active | NULL
2015-12-02 | active | active
2015-12-03 | inactive| active
2015-12-04 | inactive| inactive
2015-12-05 | inactive| inactive
2015-12-06 | active | inactive
2015-12-07 | active | active
然后计算它们不相同的情况:
SELECT *, CASE WHEN status = status_yesterday THEN 1 ELSE 0 END AS transition
FROM table
the_date | status | status_yesterday| transition
2015-12-01 | active | NULL | 1
2015-12-02 | active | active | 0
2015-12-03 | inactive| active | 1
2015-12-04 | inactive| inactive | 0
2015-12-05 | inactive| inactive | 0
2015-12-06 | active | inactive | 1
2015-12-07 | active | active | 0
然后对转换进行求和以构造status_id:
SELECT *, SUM(transition) OVER (PARTITION BY status ORDER BY the_date)
FROM table
the_date | status | status_yesterday| transition | status_id
2015-12-01 | active | NULL | 1 | 1
2015-12-02 | active | active | 0 | 1
2015-12-03 | inactive| active | 1 | 1
2015-12-04 | inactive| inactive | 0 | 1
2015-12-05 | inactive| inactive | 0 | 1
2015-12-06 | active | inactive | 1 | 2
2015-12-07 | active | active | 0 | 2
现在,status
和status_id
的串联可用于识别要分区的不同链。
合在一起:
WITH add_yesterday_status AS (
SELECT the_date
, status
, LAG(status) OVER (
PARTITION BY 1
ORDER BY the_date
) AS status_yesterday
FROM table
)
, add_transition AS (
SELECT *
, CASE WHEN status = status_yesterday THEN 1 ELSE 0 END AS transition
FROM add_yesterday_status
)
, add_status_id AS (
SELECT *
, SUM(transition) OVER (
PARTITION BY status
ORDER BY the_date
) AS status_id
FROM add_transition
)
, add_days_in_current_status AS (
SELECT *
, ROW_NUMBER() OVER (
PARTITION BY status, status_id
ORDER BY the_date
) AS days_in_current_status
)
SELECT the_date
, status
, days_in_current_status
FROM add_days_in_current_status