Question

我发现Postgres没有在分区表上使用范围查询的索引。

父表及其分区的日期列使用btree编制索引。

像这样的查询：

select * from parent_table where date >= '2015-07-01';

不使用索引。

EXPLAIN结果：

Append  (cost=0.00..106557.52 rows=3263963 width=128)
->  Seq Scan on parent_table  (cost=0.00..0.00 rows=1 width=640)
    Filter: (date >= '2015-07-01'::date)
->  Seq Scan on z_partition_2015_07  (cost=0.00..106546.02 rows=3263922 width=128)
    Filter: (date >= '2015-07-01'::date)
->  Seq Scan on z_partition_2015_08  (cost=0.00..11.50 rows=40 width=640)
    Filter: (date >= '2015-07-01'::date)

但是像这样的查询：

select * from parent_table where date = '2015-07-01'

使用索引。

EXPLAIN结果：

    Append  (cost=0.00..30400.95 rows=107602 width=128)
->  Seq Scan on parent_table  (cost=0.00..0.00 rows=1 width=640)
    Filter: (date = '2015-07-01'::date)
->  Index Scan using z_partition_2015_07_date on z_partition_2015_07  (cost=0.43..30400.95 rows=107601 width=128)
    Index Cond: (date = '2015-07-01'::date)

当我在索引为date的其他普通表上运行查询时，两个查询都使用索引。

我们应该对分区表索引做什么特别的事情？

Answer 1

我假设您知道Postgres中的“分区”是单独的表。在检索表的大部分时，索引通常不（超过约5％，这取决于许多细节），因为通常按顺序扫描表通常会更快在这种情况下。

此外，您似乎在第一个查询中从相关分区中选择所有行。没有用于索引...

通常，=的等式谓词比带有>=的谓词更强选择性。想一想：

date >= '2015-07-01'的第一个查询从分区中检索所有行（猜测，我需要查看确切的定义）。使用索引只会增加开销成本。但是，使用date = '2015-07-01'的第二个查询只会获取小百分比。 Postgres希望索引扫描更快。

Answer 2

或许这样的速度更快？运行您的查询，然后执行以下操作：

SET enable_seqscan=false

再次运行它。

Postgres不在分区表中使用索引进行范围查询

2 个答案: