Question

我最近遇到并解决了一个问题-但我不明白为什么开始时甚至还有一个问题。

简化说，我在postgres 10.5数据库中有3个表：

entities (id, name)
entities_to_stuff(
    id, 
    entities_id -> fk entities.id, 
    stuff_id -> fk stuff.id, 
    unique constraint (entity_id, stuff_id)
)
stuff(id, name)

插入大约20万条记录后，在查询中进行选择：

select * from entities_to_stuff where entities_id = 1;

开始耗时100-400毫秒。

可以理解，创建唯一约束会在唯一字段上创建索引。因此我在(entities_id, stuff_id)上有一个索引，entities_id是“最左侧”的列。

根据文档，包括最左边一列在内的查询是最有效的（postgres docs on this）-因此，我认为该索引将对我有用。

所以我检查了执行计划-它没有使用索引。所以，只是为了确保我做到了：

SET enable_seqscan = OFF;

并重新运行查询-大部分时间仍花费超过100毫秒。

然后我很生气并创建了该索引

create index "idx_entities_id" on "entities_to_stuff" ("entities_id");

突然之间，它需要0.2毫秒甚至更少的时间来运行，并且在启用顺序扫描后，执行计划也会使用它。

该索引比现有索引快几个数量级？

编辑：

生成附加索引后的执行计划：

Index Scan using idx_entities_id on entities_to_stuff (cost=0.00..12.04 rows=2 width=32) (actual time=0.049..0.050 rows=1 loops=1)
  Index Cond: (entities_id = 199283)
Planning time: 0.378 ms
Execution time: 0.073 ms

仅具有唯一约束seq_scan = on

的计划

Gather  (cost=1000.00..38679.87 rows=2 width=32) (actual time=344.321..1740.861 rows=1 loops=1)
  Workers Planned: 2
  Workers Launched: 0
  ->  Parallel Seq Scan on entities_to_stuff  (cost=0.00..37679.67 rows=1 width=32) (actual time=344.088..1739.684 rows=1 loops=1)
        Filter: (entities_id = 199283)
        Rows Removed by Filter: 2907419
Planning time: 0.241 ms
Execution time: 740.888 ms

有约束的计划，seq-scan = off

Index Scan using uq_entities_to_stuff on entities_to_stuff  (cost=0.43..66636.34 rows=2 width=32) (actual time=0.385..553.066 rows=1 loops=1)
  Index Cond: (entities_id = 199283)
Planning time: 0.082 ms
Execution time: 553.103 ms

postgres指数表现不清楚

0 个答案: