我是PostgreSQL的新手并且日复一日地学习。我正在使用PostgreSQL 9.4。
我有每日数据,并希望创建一个值为1的二进制变量,如果另一个变量(此处为最小流量)连续5天至少为正。
数据具有以下结构(“test”是我想要创建的变量):
Group_id | date | min_flow | test
------------+----------------+----------------------------
1 | 2012-02-01 | 0 | 0
1 | 2012-02-02 | 0 | 0
1 | 2012-02-03 | 1.5 | 1
1 | 2012-02-04 | 1 | 1
1 | 2012-02-05 | 0.7 | 1
1 | 2012-02-06 | 0.8 | 1
1 | 2012-02-07 | 1.2 | 1
1 | 2012-02-08 | 1.5 | 1
1 | 2012-02-09 | 0 | 0
1 | 2012-02-10 | 0 | 0
1 | 2012-02-11 | 0.9 | 0
1 | 2012-02-12 | 1.2 | 0
1 | 2012-02-13 | 0 | 0
1 | 2012-02-14 | 0 | 0
我通过使用窗口函数实现了这个目的:
SELECT CASE WHEN min_flow > 0
AND
(
-- current row + next 4 rows have a min(flow) > 0
((lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lead(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lead(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lead(min_flow, 4) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
OR (
-- current row – previous 4 rows have a min(flow) > 0
(lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lag(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lag(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lag(min_flow, 4) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
OR (
-- current row + next 3 rows – previous 1 row have a min(flow) > 0
(lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lead(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lead(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
OR (
-- current row + next 2 rows –previous 2 rows have a min(flow) > 0
(lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lead(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lag(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
OR (
-- current row + next 1 row – previous 3 rows have a min(flow) > 0
(lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lag(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
AND (lag(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
)
THEN 1 ELSE 0 END AS test
FROM table
但是,我想知道是否有更好/更有效的方法呢?
非常感谢任何帮助!
非常感谢提前!!
答案 0 :(得分:0)
我想我会考虑使用嵌套查询,首先要查看过去五天内是否有五个连续的正值,其中包含:
sum(case when min_flow > 0 then 1 else 0 end)
over (partition by group_id
order by group_id, _date_
range between _date_ - '4 day'::Interval preceding
and _date_) count_of_positives_in_last_5_days
...然后在接下来的五天内查找该值的最大值。
max(count_of_positives_in_last_5_days)
over (partition by group_id
order by group_id, _date_
range between _date_
and _date_ + '4 day'::Interval following) max_count_of_positives_in_last_5_days
如果该最大值为5,则返回1,否则返回0.
如果您可以对数据执行SQLFiddle,那么我可以更好地解释它,并测试它实际上是否有效:)
答案 1 :(得分:0)
通过postive / non_positive row_number()
使用min_flow
与所有行之间的差异以及分区之间的差异来确定连续行的组:
select
group_id,
_date_,
min_flow,
(count(*) over w_diff > 4)::int test
from (
select *,
row_number() over w_all rn_all,
row_number() over w_pos rn_pos
from a_table
window
w_all as (order by _date_),
w_pos as (partition by min_flow > 0 order by _date_)
) s
window w_diff as (partition by rn_all- rn_pos)
order by _date_;
用于说明方法的查询:
select
*,
rn_all- rn_pos diff,
(count(*) over w_diff > 4)::int test
from (
select *,
row_number() over w_all rn_all,
row_number() over w_pos rn_pos
from a_table
window
w_all as (order by _date_),
w_pos as (partition by min_flow > 0 order by _date_)
) s
window w_diff as (partition by rn_all- rn_pos)
order by _date_;
group_id | _date_ | min_flow | rn_all | rn_pos | diff | test
----------+------------+----------+--------+--------+------+------
1 | 2012-02-01 | 0 | 1 | 1 | 0 | 0
1 | 2012-02-02 | 0 | 2 | 2 | 0 | 0
1 | 2012-02-03 | 1.5 | 3 | 1 | 2 | 1
1 | 2012-02-04 | 1 | 4 | 2 | 2 | 1
1 | 2012-02-05 | 0.7 | 5 | 3 | 2 | 1
1 | 2012-02-06 | 0.8 | 6 | 4 | 2 | 1
1 | 2012-02-07 | 1.2 | 7 | 5 | 2 | 1
1 | 2012-02-08 | 1.5 | 8 | 6 | 2 | 1
1 | 2012-02-09 | 0 | 9 | 3 | 6 | 0
1 | 2012-02-10 | 0 | 10 | 4 | 6 | 0
1 | 2012-02-11 | 0.9 | 11 | 7 | 4 | 0
1 | 2012-02-12 | 1.2 | 12 | 8 | 4 | 0
1 | 2012-02-13 | 0 | 13 | 5 | 8 | 0
1 | 2012-02-14 | 0 | 14 | 6 | 8 | 0
(14 rows)
答案 2 :(得分:0)
天真计数,假设每个group_id的日期是唯一的。 (
(group_id, zdate)
被视为候选键)
SELECT m.*, EXISTS(
SELECT 1 FROM meuk x
WHERE x.group_id = m.group_id
AND x.zdate >= m.zdate - '4 day'::interval
AND x.zdate <= m.zdate
AND x.min_flow > 0
GROUP BY x.group_id
HAVING COUNT(*) >= 5
) AS valid_for_five_days
FROM meuk m
;
结果:
group_id | zdate | min_flow | test | valid_for_five_days
----------+------------+----------+------+---------------------
1 | 2012-02-01 | 0 | f | f
1 | 2012-02-02 | 0 | f | f
1 | 2012-02-03 | 1.5 | t | f
1 | 2012-02-04 | 1 | t | f
1 | 2012-02-05 | 0.7 | t | f
1 | 2012-02-06 | 0.8 | t | f
1 | 2012-02-07 | 1.2 | t | t
1 | 2012-02-08 | 1.5 | t | t
1 | 2012-02-09 | 0 | f | f
1 | 2012-02-10 | 0 | f | f
1 | 2012-02-11 | 0.9 | f | f
1 | 2012-02-12 | 1.2 | f | f
1 | 2012-02-13 | 0 | f | f
1 | 2012-02-14 | 0 | f | f
(14 rows)