PostgreSQL:尽管有索引,但仍进行顺序扫描

时间:2019-07-31 19:24:36

标签: postgresql postgresql-9.4

我有以下两个表。

  • person_addresses
  • address_normalization

person_addresses表具有一个名为address_id的字段作为主键,而address_normalization具有一个对应的字段address_id并具有索引。

现在,当我解释以下查询时,我看到了顺序扫描。

SELECT
    count(*)
FROM
    mp_member2.person_addresses pa
JOIN mp_member2.address_normalization an ON
    an.address_id = pa.address_id
WHERE
    an.sr_modification_time >= 1550692189468;

-- Result: 2654

请参考以下屏幕截图。

enter image description here

您看到哈希联接之后有一个顺序扫描。我不确定我是否理解这部分内容;为什么在散列连接之后进行顺序扫描?

从上面的查询中可以看到,返回的记录集也很少。

这是预期的行为还是我做错了什么?


更新#1:我在两个表的sr_modification_time字段上都有索引

更新#2:完整的执行计划

Aggregate  (cost=206944.74..206944.75 rows=1 width=0) (actual time=2807.844..2807.844 rows=1 loops=1)
  Buffers: shared hit=4629 read=82217
  ->  Hash Join  (cost=2881.95..206825.15 rows=47836 width=0) (actual time=0.775..2807.160 rows=2654 loops=1)
        Hash Cond: (pa.address_id = an.address_id)
        Buffers: shared hit=4629 read=82217
        ->  Seq Scan on person_addresses pa  (cost=0.00..135924.93 rows=4911993 width=8) (actual time=0.005..1374.610 rows=4911993 loops=1)
              Buffers: shared hit=4588 read=82217
        ->  Hash  (cost=2432.05..2432.05 rows=35992 width=18) (actual time=0.756..0.756 rows=1005 loops=1)
              Buckets: 4096  Batches: 1  Memory Usage: 41kB
              Buffers: shared hit=41
              ->  Index Scan using mp_member2_address_normalization_mod_time on address_normalization an  (cost=0.43..2432.05 rows=35992 width=18) (actual time=0.012..0.424 rows=1005 loops=1)
                    Index Cond: (sr_modification_time >= 1550692189468::bigint)
                    Buffers: shared hit=41
Planning time: 0.244 ms
Execution time: 2807.885 ms

更新#3:我尝试使用新的时间戳,它使用了索引扫描。

EXPLAIN (
    ANALYZE
    , buffers
    , format TEXT
) SELECT
    COUNT(*)
FROM
    mp_member2.person_addresses pa
JOIN mp_member2.address_normalization an ON
    an.address_id = pa.address_id
WHERE
    an.sr_modification_time >= 1557507300342;

-- count: 1364

查询计划:

Aggregate  (cost=295.48..295.49 rows=1 width=0) (actual time=2.770..2.770 rows=1 loops=1)
  Buffers: shared hit=1404
  ->  Nested Loop  (cost=4.89..295.43 rows=19 width=0) (actual time=0.038..2.491 rows=1364 loops=1)
        Buffers: shared hit=1404
        ->  Index Scan using mp_member2_address_normalization_mod_time on address_normalization an  (cost=0.43..8.82 rows=14 width=18) (actual time=0.009..0.142 rows=341 loops=1)
              Index Cond: (sr_modification_time >= 1557507300342::bigint)
              Buffers: shared hit=14
        ->  Bitmap Heap Scan on person_addresses pa  (cost=4.46..20.43 rows=4 width=8) (actual time=0.004..0.005 rows=4 loops=341)
              Recheck Cond: (address_id = an.address_id)
              Heap Blocks: exact=360
              Buffers: shared hit=1390
              ->  Bitmap Index Scan on idx_mp_member2_person_addresses_address_id  (cost=0.00..4.46 rows=4 width=0) (actual time=0.003..0.003 rows=4 loops=341)
                    Index Cond: (address_id = an.address_id)
                    Buffers: shared hit=1030
Planning time: 0.214 ms
Execution time: 2.816 ms

1 个答案:

答案 0 :(得分:0)

这是预期的行为,因为您没有sr_modification_time的索引,因此在创建哈希联接数据库之后,必须扫描整个集合以检查每一行的sr_modification_time

您应该创建:

  • (sr_modification_time)的索引
  • (address_id , sr_modification_time )的综合索引