根据小时过滤窗口功能

时间:2013-02-21 16:52:49

标签: sql postgresql greenplum

我想使用两个不同(但相似)的窗口函数来计算两个值 SUM上的COUNTis_active over user_id+item,仅限行 - 减去1小时。 我的直觉是使用ROWS UNBOUNDED PRECEDING,但这样我无法过滤时间

COUNT(1) OVER(PARTITION BY user_id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING) 
SUM(is_active) OVER(PARTITION BY user-id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING) 

然而,这并未考虑'1小时前'间隔因素

考虑以下数据:

user_id |     req_time       | item  | is_active |  
--------+--------------------+-------------------+---
1   | 2011-01-01 12:00:00|   1   |     0     |
1   | 2011-01-01 12:30:00|   1   |     1     |
1   | 2011-01-01 15:00:00|   1   |     1     |
1   | 2011-01-01 16:00:00|   1   |     0     |
1   | 2011-01-01 16:00:00|   2   |     0     |
1   | 2011-01-01 16:20:00|   2   |     1     |
2   | 2011-02-02 11:00:00|   1   |     1     |
2   | 2011-02-02 13:00:00|   1   |     0     |
1   | 2011-02-02 16:20:00|   1   |     0     |
1   | 2011-02-02 16:30:00|   2   |     0     |

我希望得到以下结果:“value 1”是SUM(is_active),“value 2”是COUNT(1):

user_id |     req_time       | item  | value 1 | value 2 |  
--------+--------------------+-----------------+---------+
1   | 2011-01-01 12:00:00|   1   |    0    |    0    |
1   | 2011-01-01 12:30:00|   1   |    0    |    0    |
1   | 2011-01-01 15:00:00|   1   |    1    |    2    |
1   | 2011-01-01 16:00:00|   1   |    2    |    3    |
1   | 2011-01-01 16:00:00|   2   |    0    |    0    |
1   | 2011-01-01 16:20:00|   2   |    0    |    0    |
2   | 2011-02-02 11:00:00|   1   |    0    |    0    |
2   | 2011-02-02 13:00:00|   1   |    1    |    1    |
1   | 2011-02-02 16:20:00|   1   |    2    |    4    |
1   | 2011-02-02 16:30:00|   2   |    1    |    2    |

我正在使用基于Postgresql 8.2.15的Greenplum 4.21

提前致谢! gilibi

2 个答案:

答案 0 :(得分:2)

我不知道如何使用窗口函数来完成此操作,至少不容易。

我所知道的最简单的方法是使用select子句中的相关子查询:

select t.*,
       (select count(*) from t t2
        where t2.user_id = t.user_id and t2.item = t.item and
              t2.req_time < t.req_time - interval '1 hour'
       ) as value1,
       (select SUM(is_active) from t t2
        where t2.user_id = t.user_id and t2.item = t.item and
              t2.req_time < t.req_time - interval '1 hour'
       ) as value2
from t

您可以在不使用相关子查询的情况下执行此操作。这只是一点点麻烦。 。

select t.user_id, t.req_time, t.item,
       count(*) as value1,
       sum(t2.isactive) as value2
from t left outer join
     t t2
     on t.user_id = t2.user_id and
        t.item = t2.item and
        t2.req_time < t.req_time - interval '1 hour'
group by t.user_id, t.req_time, t.item 

这可能比相关子查询版本更有效(因为有两个相关性)。而且,它应该在GreenPlum中工作。我没有意识到它缺乏对相关子查询的支持。这与ANSI相当重要。

答案 1 :(得分:1)

8.3 at SQL Fiddle。只有一个子选择。

select user_id, req_time, item, v[1] as value1, v[2] as value2
from (
    select t.*,
        (
            select array[
                coalesce(sum(is_active::integer), 0),
                count(*)
                ] as v
            from t s
            where
                user_id = t.user_id
                and item = t.item
                and req_time <= t.req_time - interval '1 hour'
        ) as v
    from t
) s
order by req_time, user_id, item