为什么Postgres拒绝在某些设置中使用复合索引?

时间:2017-02-19 09:14:22

标签: postgresql indexing composite

所以,我有一张看起来像的表:

                                 Table "public.rule_traffic"
          Column       |  Type   |                       Modifiers
     id                | bigint  | not null default nextval('rule_traffic_seq'::regclass)
     device_id         | integer | not null
     version_id        | integer | not null
     policy_name       | text    |
     rule_uid          | uuid    | not null
     traffic_hash_code | bigint  | not null
     action            | integer |

以及这些索引:

"rule_traffic_pkey" PRIMARY KEY, btree (id)
"unique_device_id_version_id_policy_name_uid_in_rule_traffic" UNIQUE, btree (device_id, version_id, policy_name, rule_uid)

当我在我的设置(以及许多其他人)上运行测试查询时,看起来我实际上正在使用已定义的索引unique_device_id_version_id_policy_name_uid_in_rule_traffic:

                                                                             QUERY PLAN
HashAggregate  (cost=8.29..8.30 rows=1 width=56) (actual time=1.563..1.563 rows=0 loops=1)
->  Index Scan using unique_device_id_version_id_policy_name_uid_in_rule_traffic on rule_traffic this_  (cost=0.00..8.28 rows=1 width=56) (actual time=1.558..1.558 rows=0 loops=1)
     Index Cond: ((device_id = 11) AND (policy_name IS NULL))
     Filter: ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid))
Total runtime: 1.704 ms

但是有一个完全不同的查询计划(序列扫描)的设置:

                                                                                    QUERY PLAN
HashAggregate  (cost=150538.23..150538.25 rows=2 width=56) (actual time=2403.600..2403.601 rows=2 loops=1)
->  Seq Scan on rule_traffic this_  (cost=0.00..150538.20 rows=4 width=56) (actual time=2354.481..2403.573 rows=2 loops=1)
     Filter: ((policy_name IS NULL) AND (device_id = 11) AND ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid)))
Total runtime: 2403.661 ms

我尝试在桌面上运行VACUUM FULL \ ANALYZE但没有结果。

有谁知道为什么postgres决定不使用复合索引?

更新1:

试图强制不使用序列扫描:

securetrack=# explain analyze select max(this_.id) as y0_, this_.rule_uid as y1_, this_.policy_name as y2_ from rule_traffic this_ where this_.device_id=11 and ((this_.rule_uid='f6c0dc29-e741-4f9a-adf1-f11d18768af3' and this_.policy_name is null) OR (this_.rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e' and this_.policy_name is null)) group by this_.rule_uid, this_.policy_name;

QUERY PLAN
 HashAggregate  (cost=209498.38..209498.40 rows=2 width=56) (actual time=2475.980..2475.981 rows=2 loops=1)
   ->  Seq Scan on rule_traffic this_  (cost=0.00..209498.35 rows=4 width=56) (actual time=1631.945..2475.950 rows=3 loops=1)
     Filter: ((policy_name IS NULL) AND (device_id = 11) AND ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid)))
 Total runtime: 2476.038 ms
(4 rows)

设置seqscan = false:

securetrack=# SET enable_seqscan=false;
SET
securetrack=# explain analyze select max(this_.id) as y0_, this_.rule_uid as y1_, this_.policy_name as y2_ from rule_traffic this_ where this_.device_id=11 and ((this_.rule_uid='f6c0dc29-e741-4f9a-adf1-f11d18768af3' and this_.policy_name is null) OR (this_.rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e' and this_.policy_name is null)) group by this_.rule_uid, this_.policy_name;
                                                                                           QUERY PLAN
 HashAggregate  (cost=371469.08..371469.10 rows=2 width=56) (actual time=2936.608..2936.610 rows=2 loops=1)
   ->  Bitmap Heap Scan on rule_traffic this_  (cost=197981.02..371469.05 rows=4 width=56) (actual time=2308.843..2936.577 rows=3 loops=1)
     Recheck Cond: ((device_id = 11) AND (policy_name IS NULL))
     Filter: ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid))
     ->  Bitmap Index Scan on unique_device_id_version_id_policy_name_uid_in_rule_traffic  (cost=0.00..197981.02 rows=5774287 width=0) (actual time=1283.603..1283.603 rows=5849739 loops=1)
           Index Cond: ((device_id = 11) AND (policy_name IS NULL))
 Total runtime: 2936.680 ms
(7 rows)

看起来成本实际上更高。 怎么会这样?

1 个答案:

答案 0 :(得分:3)

PostgreSQL正在做正确的事。

如果查看强制它使用索引的查询计划,您将看到索引扫描找到带有(device_id = 11) AND (policy_name IS NULL)的5849739行,所有这些行都必须与表一起重新检查。

现在扫描索引的这么大部分并重新检查找到的所有表行比整个表的顺序扫描更昂贵(顺序读取通常比随机访问读取更快)。

使用EXPLAIN (ANALYZE, BUFFERS)是有益的,因为它会显示访问的实际数据库块数。