我在一个很大的 foreign(列存储)表上有一个非常简单的查询:
select distinct "Business Unit", "Application", "Application Suite", "Account Name"
from __tmp_l1_11259
where "Date" between '2017-10-01' and '2017-10-31';
因此,我使用提示hash group
在SQL Server中启动此查询,它在不到两秒钟的时间内执行 。 PostgreSQL有两种方式:使用HashAggregate或通过对行进行排序并对唯一行进行分组。这是计划:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2169041.54..2169041.55 rows=1 width=128) (actual time=11671.983..11672.065 rows=407 loops=1)
Group Key: "Business Unit", "Application", "Application Suite", "Account Name"
-> Foreign Scan on tmp_l1_ (cost=0.00..2164695.79 rows=434575 width=128) (actual time=6.576..4830.866 rows=14237546 loops=1)
Filter: (("Date" >= '2017-10-01 00:00:00'::timestamp without time zone) AND ("Date" <= '2017-10-31 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 8702454
CStore File: /datadrive/postgresql/cstore_fdw/16507/16540
CStore File Size: 87457953966
Planning time: 15.914 ms
Execution time: 11672.927 ms
(9 rows)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2261840.10..2267272.29 rows=1 width=128) (actual time=44412.373..57237.559 rows=407 loops=1)
-> Sort (cost=2261840.10..2262926.54 rows=434575 width=128) (actual time=44412.371..53115.637 rows=14237546 loops=1)
Sort Key: "Business Unit", "Application", "Application Suite", "Account Name"
Sort Method: external merge Disk: 804440kB
-> Foreign Scan on tmp_l1_ (cost=0.00..2164695.79 rows=434575 width=128) (actual time=6.209..5488.539 rows=14237546 loops=1)
Filter: (("Date" >= '2017-10-01 00:00:00'::timestamp without time zone) AND ("Date" <= '2017-10-31 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 8702454
CStore File: /datadrive/postgresql/cstore_fdw/16507/16540
CStore File Size: 87457953966
Planning time: 19.011 ms
Execution time: 76676.073 ms
(11 rows)
是否可以在不更改PostgreSQL源代码中HashAggregate算法的情况下提高此查询的性能?如果是,怎么办?
答案 0 :(得分:0)
如果您可以在插入数据时按日期对数据进行排序,则由于使用了跳过索引,因此查询可以更快地工作。