我有一个PG查询正在进行全表扫描,而应该使用(看似明显的)索引。它涉及join
和IN
,如下所示:
SELECT sequence.name
FROM sequence, annotation
WHERE sequence.id = annotation.sequence_id
AND lower(annotation.name) LIKE lower('my-query%')
AND sequence.folder_id IN (2504, 6039, 35);
sequence.folder_id
已编入索引:
# \d sequence_folder_id_idx
Index "public.sequence_folder_id_idx"
Column | Type | Definition
-----------+---------+------------
folder_id | integer | folder_id
btree, for table "public.sequence"
我希望它首先按folder_id
进行过滤,然后按annotation.name
进行过滤。相反,它会对lower(annotation.name)
执行完整扫描。这是使用EXPLAIN(ANALYZE,BUFFER)的查询计划:
Hash Join (cost=731.39..21141.97 rows=20 width=16) (actual time=526.315..526.315 rows=0 loops=1)
Hash Cond: (annotation.sequence_id = sequence.id)
Buffers: shared hit=695 read=8639
-> Seq Scan on annotation (cost=0.00..20396.54 rows=3688 width=4) (actual time=526.314..526.314 rows=0 loops=1)
Filter: (lower((name)::text) ~~ 'my-query%'::text)
Rows Removed by Filter: 737503
Buffers: shared hit=695 read=8639
-> Hash (cost=724.65..724.65 rows=539 width=20) (never executed)
-> Index Scan using sequence_folder_id_idx on sequence (cost=0.32..724.65 rows=539 width=20) (never executed)
Index Cond: (folder_id = ANY ('{2504,6039,35}'::integer[]))
Total runtime: 526.365 ms
如您所见,它会扫描737k行,按annotation.name
进行过滤,而不是先按folder_id
进行限制。
如果我使用ANY( VALUES (2504) ...)
,那就非常有效:
SELECT sequence.name
FROM sequence, annotation
WHERE sequence.id = annotation.sequence_id
AND lower(annotation.name) LIKE lower('my-query%')
AND sequence.folder_id = ANY(VALUES (2504), (6039), (35));
Nested Loop (cost=0.76..368.71 rows=4 width=16) (actual time=7.795..7.795 rows=0 loops=1)
Buffers: shared hit=1184
-> Nested Loop (cost=0.34..150.52 rows=103 width=20) (actual time=0.033..0.497 rows=292 loops=1)
Buffers: shared hit=96
-> HashAggregate (cost=0.05..0.08 rows=3 width=4) (actual time=0.013..0.015 rows=3 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.04 rows=3 width=4) (actual time=0.002..0.005 rows=3 loops=1)
-> Index Scan using sequence_folder_id_idx on sequence (cost=0.29..49.81 rows=34 width=24) (actual time=0.017..0.101 rows=97 loops=3)
Index Cond: (folder_id = "*VALUES*".column1)
Buffers: shared hit=96
-> Index Scan using annotation_sequence_id_idx on annotation (cost=0.42..2.11 rows=1 width=4) (actual time=0.024..0.024 rows=0 loops=292)
Index Cond: (sequence_id = sequence.id)
Filter: (lower((name)::text) ~~ 'my-query%'::text)
Rows Removed by Filter: 12
Buffers: shared hit=1088
Total runtime: 7.883 ms
有没有人对这种情况发生的原因有合理的解释?