是否可以将多个窗口函数应用于同一分区? (如果我没有使用正确的词汇,请纠正我)
例如你可以做
SELECT name, first_value() over (partition by name order by date) from table1
但有没有办法做一些事情:
SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1
我们在同一个窗口上应用两个函数?
参考: http://postgresql.ro/docs/8.4/static/tutorial-window.html
答案 0 :(得分:19)
你能不能只根据选择使用窗口
像
这样的东西SELECT name,
first_value() OVER (partition by name order by date) as f,
last_value() OVER (partition by name order by date) as l
from table1
同样来自你的参考,你可以这样做
SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)
答案 1 :(得分:14)
警告: 我不删除此答案,因为它在技术上似乎是正确的,因此可能会有所帮助,但要小心无论如何,PARTITION BY bar ORDER BY foo
可能不是您想要做的。实际上,聚合函数不会整体计算分区元素。也就是说,SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
与SELECT avg(foo) OVER (PARTITION BY bar)
不等同(请参阅答案末尾的证据)。
虽然它不会提高性能本身,但如果你多次使用相同的分区,你可能想要使用旁观者提出的第二种语法,而不是只是因为它写起来更便宜。这就是原因。
考虑以下问题:
SELECT
array_agg(foo)
OVER (PARTITION BY bar ORDER BY foo),
avg(baz)
OVER (PARTITION BY bar ORDER BY foo)
FROM
foobar;
由于原则上排序对平均值的计算没有影响,因此您可能会尝试使用以下查询(在第二个分区上没有排序):
SELECT
array_agg(foo)
OVER (PARTITION BY bar ORDER BY foo),
avg(baz)
OVER (PARTITION BY bar)
FROM
foobar;
这是一个大错误,因为它需要更长的时间。证明:
> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
-> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
Sort Key: bar, foo
Sort Method: quicksort Memory: 130006kB
-> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
Total runtime: 2458.969 ms
(6 lignes)
> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
-> WindowAgg (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
-> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
Sort Key: bar, foo
Sort Method: quicksort Memory: 130006kB
-> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
Total runtime: 3060.041 ms
(7 lignes)
现在,如果你知道这个问题,当然你将在任何地方使用相同的分区。但是当你有十倍或更多相同的分区并且你要在几天内更新它时,很容易忘记在一个不需要它的分区上添加ORDER BY
子句。
这里有WINDOW
语法,它可以防止你出现这种粗心的错误(前提是,你知道最好尽量减少不同窗口函数的数量)。以下内容与第一个查询严格等同(据我所知EXPLAIN ANALYZE
):
SELECT
array_agg(foo)
OVER qux,
avg(baz)
OVER qux
FROM
foobar
WINDOW
qux AS (PARTITION BY bar ORDER BY bar)
我理解“ SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
不等同到SELECT avg(foo) OVER (PARTITION BY bar)
”的陈述似乎有问题,所以这是一个例子:
# SELECT * FROM foobar;
foo | bar
-----+-----
1 | 1
2 | 2
3 | 1
4 | 2
(4 lines)
# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
array_agg | avg
-----------+-----
{1,3} | 2
{1,3} | 2
{2,4} | 3
{2,4} | 3
(4 lines)
# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
array_agg | avg
-----------+-----
{1} | 1
{1,3} | 2
{2} | 2
{2,4} | 3
(4 lines)