Question

我在一个很大的 foreign（列存储）表上有一个非常简单的查询：

select distinct "Business Unit", "Application", "Application Suite", "Account Name" 
    from __tmp_l1_11259 
    where "Date" between '2017-10-01' and '2017-10-31';

因此，我使用提示hash group在SQL Server中启动此查询，它在不到两秒钟的时间内执行。 PostgreSQL有两种方式：使用HashAggregate或通过对行进行排序并对唯一行进行分组。这是计划：

                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=2169041.54..2169041.55 rows=1 width=128) (actual time=11671.983..11672.065 rows=407 loops=1)
  Group Key: "Business Unit", "Application", "Application Suite", "Account Name"
  ->  Foreign Scan on tmp_l1_  (cost=0.00..2164695.79 rows=434575 width=128) (actual time=6.576..4830.866 rows=14237546 loops=1)
        Filter: (("Date" >= '2017-10-01 00:00:00'::timestamp without time zone) AND ("Date" <= '2017-10-31 00:00:00'::timestamp without time zone))
        Rows Removed by Filter: 8702454
        CStore File: /datadrive/postgresql/cstore_fdw/16507/16540
        CStore File Size: 87457953966
 Planning time: 15.914 ms
 Execution time: 11672.927 ms
(9 rows)

                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2261840.10..2267272.29 rows=1 width=128) (actual time=44412.373..57237.559 rows=407 loops=1)
   ->  Sort  (cost=2261840.10..2262926.54 rows=434575 width=128) (actual time=44412.371..53115.637 rows=14237546 loops=1)
         Sort Key: "Business Unit", "Application", "Application Suite", "Account Name"
         Sort Method: external merge  Disk: 804440kB
         ->  Foreign Scan on tmp_l1_  (cost=0.00..2164695.79 rows=434575 width=128) (actual time=6.209..5488.539 rows=14237546 loops=1)
               Filter: (("Date" >= '2017-10-01 00:00:00'::timestamp without time zone) AND ("Date" <= '2017-10-31 00:00:00'::timestamp without time zone))
               Rows Removed by Filter: 8702454
               CStore File: /datadrive/postgresql/cstore_fdw/16507/16540
               CStore File Size: 87457953966
 Planning time: 19.011 ms
 Execution time: 76676.073 ms
(11 rows)

是否可以在不更改PostgreSQL源代码中HashAggregate算法的情况下提高此查询的性能？如果是，怎么办？

Answer 1

如果您可以在插入数据时按日期对数据进行排序，则由于使用了跳过索引，因此查询可以更快地工作。

PostgreSQL HashAggregate缺乏性能

1 个答案: