我想使用两个不同(但相似)的窗口函数来计算两个值
SUM
上的COUNT
和is_active over user_id+item
,仅限行 - 减去1小时。
我的直觉是使用ROWS UNBOUNDED PRECEDING
,但这样我无法过滤时间
COUNT(1) OVER(PARTITION BY user_id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING)
SUM(is_active) OVER(PARTITION BY user-id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING)
然而,这并未考虑'1小时前'间隔因素
考虑以下数据:
user_id | req_time | item | is_active |
--------+--------------------+-------------------+---
1 | 2011-01-01 12:00:00| 1 | 0 |
1 | 2011-01-01 12:30:00| 1 | 1 |
1 | 2011-01-01 15:00:00| 1 | 1 |
1 | 2011-01-01 16:00:00| 1 | 0 |
1 | 2011-01-01 16:00:00| 2 | 0 |
1 | 2011-01-01 16:20:00| 2 | 1 |
2 | 2011-02-02 11:00:00| 1 | 1 |
2 | 2011-02-02 13:00:00| 1 | 0 |
1 | 2011-02-02 16:20:00| 1 | 0 |
1 | 2011-02-02 16:30:00| 2 | 0 |
我希望得到以下结果:“value 1”是SUM(is_active),“value 2”是COUNT(1):
user_id | req_time | item | value 1 | value 2 |
--------+--------------------+-----------------+---------+
1 | 2011-01-01 12:00:00| 1 | 0 | 0 |
1 | 2011-01-01 12:30:00| 1 | 0 | 0 |
1 | 2011-01-01 15:00:00| 1 | 1 | 2 |
1 | 2011-01-01 16:00:00| 1 | 2 | 3 |
1 | 2011-01-01 16:00:00| 2 | 0 | 0 |
1 | 2011-01-01 16:20:00| 2 | 0 | 0 |
2 | 2011-02-02 11:00:00| 1 | 0 | 0 |
2 | 2011-02-02 13:00:00| 1 | 1 | 1 |
1 | 2011-02-02 16:20:00| 1 | 2 | 4 |
1 | 2011-02-02 16:30:00| 2 | 1 | 2 |
我正在使用基于Postgresql 8.2.15的Greenplum 4.21
提前致谢! gilibi
答案 0 :(得分:2)
我不知道如何使用窗口函数来完成此操作,至少不容易。
我所知道的最简单的方法是使用select
子句中的相关子查询:
select t.*,
(select count(*) from t t2
where t2.user_id = t.user_id and t2.item = t.item and
t2.req_time < t.req_time - interval '1 hour'
) as value1,
(select SUM(is_active) from t t2
where t2.user_id = t.user_id and t2.item = t.item and
t2.req_time < t.req_time - interval '1 hour'
) as value2
from t
您可以在不使用相关子查询的情况下执行此操作。这只是一点点麻烦。 。
select t.user_id, t.req_time, t.item,
count(*) as value1,
sum(t2.isactive) as value2
from t left outer join
t t2
on t.user_id = t2.user_id and
t.item = t2.item and
t2.req_time < t.req_time - interval '1 hour'
group by t.user_id, t.req_time, t.item
这可能比相关子查询版本更有效(因为有两个相关性)。而且,它应该在GreenPlum中工作。我没有意识到它缺乏对相关子查询的支持。这与ANSI相当重要。
答案 1 :(得分:1)
8.3 at SQL Fiddle。只有一个子选择。
select user_id, req_time, item, v[1] as value1, v[2] as value2
from (
select t.*,
(
select array[
coalesce(sum(is_active::integer), 0),
count(*)
] as v
from t s
where
user_id = t.user_id
and item = t.item
and req_time <= t.req_time - interval '1 hour'
) as v
from t
) s
order by req_time, user_id, item