Question

我有一个PG查询正在进行全表扫描，而应该使用（看似明显的）索引。它涉及join和IN，如下所示：

SELECT sequence.name
FROM sequence, annotation
WHERE sequence.id = annotation.sequence_id
AND lower(annotation.name) LIKE lower('my-query%')
AND sequence.folder_id IN (2504,  6039, 35);

sequence.folder_id已编入索引：

# \d sequence_folder_id_idx
Index "public.sequence_folder_id_idx"
  Column   |  Type   | Definition
-----------+---------+------------
 folder_id | integer | folder_id
btree, for table "public.sequence"

我希望它首先按folder_id进行过滤，然后按annotation.name进行过滤。相反，它会对lower(annotation.name)执行完整扫描。这是使用EXPLAIN（ANALYZE，BUFFER）的查询计划：

Hash Join  (cost=731.39..21141.97 rows=20 width=16) (actual time=526.315..526.315 rows=0 loops=1)
    Hash Cond: (annotation.sequence_id = sequence.id)
    Buffers: shared hit=695 read=8639
    ->  Seq Scan on annotation  (cost=0.00..20396.54 rows=3688 width=4) (actual time=526.314..526.314 rows=0 loops=1)
          Filter: (lower((name)::text) ~~ 'my-query%'::text)
          Rows Removed by Filter: 737503
          Buffers: shared hit=695 read=8639
    ->  Hash  (cost=724.65..724.65 rows=539 width=20) (never executed)
         ->  Index Scan using sequence_folder_id_idx on sequence  (cost=0.32..724.65 rows=539 width=20) (never executed)
               Index Cond: (folder_id = ANY ('{2504,6039,35}'::integer[]))
Total runtime: 526.365 ms

如您所见，它会扫描737k行，按annotation.name进行过滤，而不是先按folder_id进行限制。

如果我使用ANY( VALUES (2504) ...)，那就非常有效：

SELECT sequence.name
FROM sequence, annotation
WHERE sequence.id = annotation.sequence_id
AND lower(annotation.name) LIKE lower('my-query%')
AND sequence.folder_id = ANY(VALUES (2504), (6039), (35));

Nested Loop  (cost=0.76..368.71 rows=4 width=16) (actual time=7.795..7.795 rows=0 loops=1)
  Buffers: shared hit=1184
  ->  Nested Loop  (cost=0.34..150.52 rows=103 width=20) (actual time=0.033..0.497 rows=292 loops=1)
        Buffers: shared hit=96
        ->  HashAggregate  (cost=0.05..0.08 rows=3 width=4) (actual time=0.013..0.015 rows=3 loops=1)
              ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=4) (actual time=0.002..0.005 rows=3 loops=1)
        ->  Index Scan using sequence_folder_id_idx on sequence  (cost=0.29..49.81 rows=34 width=24) (actual time=0.017..0.101 rows=97 loops=3)
              Index Cond: (folder_id = "*VALUES*".column1)
              Buffers: shared hit=96
  ->  Index Scan using annotation_sequence_id_idx on annotation  (cost=0.42..2.11 rows=1 width=4) (actual time=0.024..0.024 rows=0 loops=292)
        Index Cond: (sequence_id = sequence.id)
        Filter: (lower((name)::text) ~~ 'my-query%'::text)
        Rows Removed by Filter: 12
        Buffers: shared hit=1088
Total runtime: 7.883 ms

有没有人对这种情况发生的原因有合理的解释？

Postgres执行全表扫描，而不是先应用其他WHERE子句

0 个答案: