我有一个相对较大的表(大约一百万条记录),包含以下列:
该帐户在实践中是一个UUID,但我认为这并不重要。如果我执行以下简单查询,我的机器上大约需要16秒:
select account, group, classification, max(size)
from mytable
group by account, group, classification
到目前为止一切顺利。假设我添加了一个索引:
CREATE INDEX concurrently ON mytable (account, group, classification);
如果我再次执行相同的查询,它现在可以在不到半秒的时间内返回结果。解释查询还清楚地表明使用了索引。
但是,如果我将查询改为
select account, group, classification, max(size)
from mytable
group by account, classification, group
再次花费16秒,不再使用索引。在我看来,分组标准的顺序并不重要,但我不是专家。知道为什么Postgresql不能(或者没有)优化后一个查询。我在Postgresql 9.4中试过这个。
编辑:根据要求,这是解释的输出。对于索引呼叫:
Group (cost=0.55..133878.11 rows=95152 width=76) (actual time=0.090..660.739 rows=807 loops=1)
Group Key: group_id, classification_id, account_id
-> Index Only Scan using mytable_group_id_classification_id_account_id_idx on mytable (cost=0.55..126741.72 rows=951518 width=76) (actual time=0.088..534.645 rows=951518 loops=1)
Heap Fetches: 951518
Planning time: 0.106 ms
Execution time: 660.852 ms
对于更改了groupby条件顺序的呼叫:
Group (cost=162327.31..171842.49 rows=95152 width=76) (actual time=11114.130..13938.487 rows=807 loops=1)"
Group Key: group_id, account_id, classification_id
-> Sort (cost=162327.31..164706.10 rows=951518 width=76) (actual time=11114.127..13775.235 rows=951518 loops=1)
Sort Key: group_id, account_id, classification_id
Sort Method: external merge Disk: 81136kB
-> Seq Scan on mytable (cost=0.00..25562.18 rows=951518 width=76) (actual time=0.009..192.259 rows=951518 loops=1)
Planning time: 0.111 ms
Execution time: 13948.380 ms
答案 0 :(得分:2)
无论列在GROUP BY
子句中出现的顺序如何,结果都是相同的,并且可以使用相同的执行计划。
PostgreSQL优化器不考虑重新排序GROUP BY
表达式,以查看不同的排序是否与现有索引匹配。
这是一个限制,您可以询问pgsql-hackers列表是否需要增强功能。您可以使用实现所需功能的补丁来备份它。
但是,我不确定是否会接受这样的改进。这种增强的缺点是优化器必须更多地工作,这将影响使用GROUP BY
子句的所有查询的规划时间。此外,解决此限制很容易:只需重写查询并更改GROUP BY
表达式的顺序。所以我会说事情应该保持原样。
答案 1 :(得分:2)
实际上,GROUP BY
子句中的列顺序确实会影响结果。默认情况下,结果将按GROUP BY
中的列进行排序。如果您设置自己的ORDER BY
,则结果和索引用法将相同。
演示:
CREATE TABLE coconuts (
mass int,
volume int,
loveliness int
);
INSERT INTO coconuts (mass, volume, loveliness)
SELECT (random() * 5)::int
, (random() * 5)::int
, (random() * 1000 + 9000)::int
FROM GENERATE_SERIES(1,10000000);
请注意GROUP BY
中列的顺序如何影响顺序:
SELECT mass, volume, max(loveliness)
FROM coconuts
GROUP BY mass, volume;
mass | volume | max
------+--------+-------
0 | 0 | 10000
0 | 1 | 10000
0 | 2 | 10000
...
SELECT mass, volume, max(loveliness)
FROM coconuts
GROUP BY volume, mass;
mass | volume | max
------+--------+-------
0 | 0 | 10000
1 | 0 | 10000
2 | 0 | 10000
...
以及它如何影响查询计划:
CREATE INDEX ON coconuts (mass, volume);
SET enable_seqscan=false; --To force the index if possible
EXPLAIN
SELECT mass, volume, max(loveliness)
FROM coconuts
GROUP BY (mass, volume);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=1000.46..460459.11 rows=40000 width=12)
Group Key: mass, volume
-> Gather Merge (cost=1000.46..459459.11 rows=80000 width=12)
Workers Planned: 2
-> Partial GroupAggregate (cost=0.43..449225.10 rows=40000 width=12)
Group Key: mass, volume
-> Parallel Index Scan using coconuts_mass_volume_idx on coconuts (cost=0.43..417575.10 rows=4166667 width=12)
(7 rows)
EXPLAIN
SELECT mass, volume, max(loveliness)
FROM coconuts
GROUP BY (volume, mass);
QUERY PLAN
------------------------------------------------------------------------------------------------
GroupAggregate (cost=10001658532.83..10001758932.83 rows=40000 width=12)
Group Key: volume, mass
-> Sort (cost=10001658532.83..10001683532.83 rows=10000000 width=12)
Sort Key: volume, mass
-> Seq Scan on coconuts (cost=10000000000.00..10000154055.00 rows=10000000 width=12)
(5 rows)
但是,如果您将ORDER BY
与原始GROUP BY
匹配,则原始查询计划至少在postgres 11.5中返回。
EXPLAIN
SELECT mass, volume, max(loveliness)
FROM coconuts
GROUP BY volume, mass
ORDER BY mass, volume;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=1000.46..460459.11 rows=40000 width=12)
Group Key: mass, volume
-> Gather Merge (cost=1000.46..459459.11 rows=80000 width=12)
Workers Planned: 2
-> Partial GroupAggregate (cost=0.43..449225.10 rows=40000 width=12)
Group Key: mass, volume
-> Parallel Index Scan using coconuts_mass_volume_idx on coconuts (cost=0.43..417575.10 rows=4166667 width=12)
(7 rows)