Postgresql - 如果另一列对特定数据范围为正,如何创建列

时间:2015-11-26 20:00:50

标签: postgresql

我是PostgreSQL的新手并且日复一日地学习。我正在使用PostgreSQL 9.4。

我有每日数据,并希望创建一个值为1的二进制变量,如果另一个变量(此处为最小流量)连续5天至少为正。

数据具有以下结构(“test”是我想要创建的变量):

Group_id    |    date        |    min_flow   |   test
------------+----------------+----------------------------
1          |  2012-02-01    | 0             |  0
1          |  2012-02-02    | 0             |  0
1          |  2012-02-03    | 1.5           |  1
1          |  2012-02-04    | 1             |  1
1          |  2012-02-05    | 0.7           |  1
1          |  2012-02-06    | 0.8           |  1
1          |  2012-02-07    | 1.2           |  1
1          |  2012-02-08    | 1.5           |  1
1          |  2012-02-09    | 0             |  0
1          |  2012-02-10    | 0             |  0
1          |  2012-02-11    | 0.9           |  0
1          |  2012-02-12    | 1.2           |  0
1          |  2012-02-13    | 0             |  0
1          |  2012-02-14    | 0             |  0

我通过使用窗口函数实现了这个目的:

SELECT CASE WHEN min_flow > 0                                
    AND                                        
    (
    -- current row + next 4 rows have a min(flow) > 0
    ((lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0     
    AND (lead(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lead(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lead(min_flow, 4) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    OR (
    -- current row – previous 4 rows have a min(flow) > 0
    (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0         
    AND (lag(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lag(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lag(min_flow, 4) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    OR (
    -- current row + next 3 rows – previous 1 row have a min(flow) > 0
    (lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0         
    AND (lead(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lead(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    OR (
    -- current row + next 2 rows –previous 2 rows  have a min(flow) > 0
    (lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0         
    AND (lead(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lag(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    OR (
    -- current row + next 1 row – previous 3 rows  have a min(flow) > 0
    (lead(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0         
    AND (lag(min_flow, 1) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0 
    AND (lag(min_flow, 2) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0
    AND (lag(min_flow, 3) OVER (PARTITION BY group_id ORDER BY group_id, _date_)) > 0)
    )
    THEN 1 ELSE 0 END AS test
FROM table

但是,我想知道是否有更好/更有效的方法呢?

非常感谢任何帮助!

非常感谢提前!!

3 个答案:

答案 0 :(得分:0)

我想我会考虑使用嵌套查询,首先要查看过去五天内是否有五个连续的正值,其中包含:

sum(case when min_flow > 0 then 1 else 0 end)
  over (partition by  group_id
        order by      group_id, _date_
        range between _date_ - '4 day'::Interval preceding
                  and _date_) count_of_positives_in_last_5_days

...然后在接下来的五天内查找该值的最大值。

max(count_of_positives_in_last_5_days)
  over (partition by  group_id
        order by      group_id, _date_
        range between _date_
                  and _date_ + '4 day'::Interval following) max_count_of_positives_in_last_5_days

如果该最大值为5,则返回1,否则返回0.

如果您可以对数据执行SQLFiddle,那么我可以更好地解释它,并测试它实际上是否有效:)

答案 1 :(得分:0)

通过postive / non_positive row_number()使用min_flow与所有行之间的差异以及分区之间的差异来确定连续行的组:

select 
    group_id,
    _date_,
    min_flow,
    (count(*) over w_diff > 4)::int test
from (
    select *, 
        row_number() over w_all rn_all, 
        row_number() over w_pos rn_pos
    from a_table
    window 
        w_all as (order by _date_),
        w_pos as (partition by min_flow > 0 order by _date_)
    ) s
    window w_diff as (partition by rn_all- rn_pos)
    order by _date_;

用于说明方法的查询:

select 
    *,
    rn_all- rn_pos diff,
    (count(*) over w_diff > 4)::int test
from (
    select *, 
        row_number() over w_all rn_all, 
        row_number() over w_pos rn_pos
    from a_table
    window 
        w_all as (order by _date_),
        w_pos as (partition by min_flow > 0 order by _date_)
    ) s
    window w_diff as (partition by rn_all- rn_pos)
    order by _date_;

 group_id |   _date_   | min_flow | rn_all | rn_pos | diff | test 
----------+------------+----------+--------+--------+------+------
        1 | 2012-02-01 |        0 |      1 |      1 |    0 |    0
        1 | 2012-02-02 |        0 |      2 |      2 |    0 |    0
        1 | 2012-02-03 |      1.5 |      3 |      1 |    2 |    1
        1 | 2012-02-04 |        1 |      4 |      2 |    2 |    1
        1 | 2012-02-05 |      0.7 |      5 |      3 |    2 |    1
        1 | 2012-02-06 |      0.8 |      6 |      4 |    2 |    1
        1 | 2012-02-07 |      1.2 |      7 |      5 |    2 |    1
        1 | 2012-02-08 |      1.5 |      8 |      6 |    2 |    1
        1 | 2012-02-09 |        0 |      9 |      3 |    6 |    0
        1 | 2012-02-10 |        0 |     10 |      4 |    6 |    0
        1 | 2012-02-11 |      0.9 |     11 |      7 |    4 |    0
        1 | 2012-02-12 |      1.2 |     12 |      8 |    4 |    0
        1 | 2012-02-13 |        0 |     13 |      5 |    8 |    0
        1 | 2012-02-14 |        0 |     14 |      6 |    8 |    0
(14 rows)

答案 2 :(得分:0)

天真计数,假设每个group_id的日期是唯一的。 (

(group_id, zdate)被视为候选键)

SELECT m.*, EXISTS(
        SELECT 1 FROM meuk x
        WHERE x.group_id = m.group_id
        AND x.zdate >= m.zdate - '4 day'::interval
        AND x.zdate <= m.zdate
        AND x.min_flow > 0
        GROUP BY x.group_id
        HAVING COUNT(*) >= 5
        ) AS valid_for_five_days
FROM meuk m
        ;

结果:

 group_id |   zdate    | min_flow | test | valid_for_five_days 
----------+------------+----------+------+---------------------
        1 | 2012-02-01 |        0 | f    | f
        1 | 2012-02-02 |        0 | f    | f
        1 | 2012-02-03 |      1.5 | t    | f
        1 | 2012-02-04 |        1 | t    | f
        1 | 2012-02-05 |      0.7 | t    | f
        1 | 2012-02-06 |      0.8 | t    | f
        1 | 2012-02-07 |      1.2 | t    | t
        1 | 2012-02-08 |      1.5 | t    | t
        1 | 2012-02-09 |        0 | f    | f
        1 | 2012-02-10 |        0 | f    | f
        1 | 2012-02-11 |      0.9 | f    | f
        1 | 2012-02-12 |      1.2 | f    | f
        1 | 2012-02-13 |        0 | f    | f
        1 | 2012-02-14 |        0 | f    | f
(14 rows)