在同一分区上应用多个窗口函数

时间:2009-12-13 10:25:38

标签: sql postgresql window-functions

是否可以将多个窗口函数应用于同一分区? (如果我没有使用正确的词汇,请纠正我)

例如你可以做

SELECT name, first_value() over (partition by name order by date) from table1

但有没有办法做一些事情:

SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1

我们在同一个窗口上应用两个函数?

参考: http://postgresql.ro/docs/8.4/static/tutorial-window.html

2 个答案:

答案 0 :(得分:19)

你能不能只根据选择使用窗口

这样的东西
SELECT  name, 
        first_value() OVER (partition by name order by date) as f, 
        last_value() OVER (partition by name order by date) as l 
from table1

同样来自你的参考,你可以这样做

SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)

答案 1 :(得分:14)

警告: 我不删除此答案,因为它在技术上似乎是正确的,因此可能会有所帮助,但要小心无论如何,PARTITION BY bar ORDER BY foo 可能不是您想要做的。实际上,聚合函数不会整体计算分区元素。也就是说,SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) SELECT avg(foo) OVER (PARTITION BY bar)不等同(请参阅答案末尾的证据)。

虽然它不会提高性能本身,但如果你多次使用相同的分区,你可能想要使用旁观者提出的第二种语法,而不是只是因为它写起来更便宜。这就是原因。

考虑以下问题:

SELECT 
  array_agg(foo)
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar ORDER BY foo) 
FROM 
  foobar;

由于原则上排序对平均值的计算没有影响,因此您可能会尝试使用以下查询(在第二个分区上没有排序):

SELECT 
  array_agg(foo) 
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar) 
FROM 
  foobar;

这是一个大错误,因为它需要更长的时间。证明:

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
                                                           QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
   ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
         Sort Key: bar, foo
         Sort Method: quicksort  Memory: 130006kB
         ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
 Total runtime: 2458.969 ms
(6 lignes)

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
                                                              QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
   ->  WindowAgg  (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
         ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
               Sort Key: bar, foo
               Sort Method: quicksort  Memory: 130006kB
               ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
 Total runtime: 3060.041 ms
(7 lignes)

现在,如果你知道这个问题,当然你将在任何地方使用相同的分区。但是当你有十倍或更多相同的分区并且你要在几天内更新它时,很容易忘记在一个不需要它的分区上添加ORDER BY子句。

这里有WINDOW语法,它可以防止你出现这种粗心的错误(前提是,你知道最好尽量减少不同窗口函数的数量)。以下内容与第一个查询严格等同(据我所知EXPLAIN ANALYZE):

SELECT
  array_agg(foo)
    OVER qux,
  avg(baz)
    OVER qux
FROM
  foobar
WINDOW
  qux AS (PARTITION BY bar ORDER BY bar)

警告后更新:

我理解“ SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo) 不等同SELECT avg(foo) OVER (PARTITION BY bar) ”的陈述似乎有问题,所以这是一个例子:

# SELECT * FROM foobar;
 foo | bar 
-----+-----
   1 |   1
   2 |   2
   3 |   1
   4 |   2
(4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
 array_agg | avg 
-----------+-----
 {1,3}     |   2
 {1,3}     |   2
 {2,4}     |   3
 {2,4}     |   3
 (4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
 array_agg | avg 
-----------+-----
 {1}       |   1
 {1,3}     |   2
 {2}       |   2
 {2,4}     |   3
(4 lines)