WHERE子句中OR条件下的Postgres索引行为

时间:2019-08-29 08:56:12

标签: database postgresql performance indexing

我有一个预订和一个客户表,具有以下架构:

预订表:

                                  Table "public.booking"
        Column         |           Type           | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
 deleted               | boolean                  |           |          |
 booking_id            | character varying        |           | not null |
 reference_number      | character varying        |           |          |
 checkin_date          | timestamp with time zone |           |          |
 checkout_date         | timestamp with time zone |           |          |
 status                | character varying        |           |          |
 version               | integer                  |           | not null |
 comments              | text                     |           |          |
 extra_information     | json                     |           |          |
 cancellation_reason   | character varying        |           |          |
 cancellation_datetime | timestamp with time zone |           |          |
 created_at            | timestamp with time zone |           | not null | now()
 modified_at           | timestamp with time zone |           | not null | now()
Indexes:
    "booking_pkey" PRIMARY KEY, btree (booking_id)
    "ix_booking_reference_number" UNIQUE, btree (reference_number)
    "idx_booking_sort_checkin" btree (checkin_date, created_at)
    "idx_booking_sort_checkout" btree (checkout_date, created_at)
    "idx_booking_stay_dates" btree (checkin_date, checkout_date DESC)
    "ix_booking_deleted" btree (deleted)
    "ix_booking_status" btree (status)
    "trgm_booking_ref_num" gist (reference_number gist_trgm_ops)

客户表:

                          Table "public.booking_customer"
        Column         |           Type           | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
 deleted               | boolean                  |           |          |
 customer_id           | character varying        |           | not null |
 booking_id            | character varying        |           | not null |
 first_name            | character varying        |           |          |
 last_name             | character varying        |           |          |
 phone                 | character varying        |           |          |
 email                 | character varying        |           |          |
 created_at            | timestamp with time zone |           | not null | now()
 modified_at           | timestamp with time zone |           | not null | now()
Indexes:
    "booking_customer_pkey" PRIMARY KEY, btree (customer_id, booking_id)
    "book_cust_idx" btree (booking_id, customer_id)
    "idx_booking_customer_full_name" btree (((first_name::text || ' '::text) || last_name::text))
    "ix_booking_customer_deleted" btree (deleted)
    "ix_booking_customer_email" btree (email)
    "ix_booking_customer_first_name" btree (first_name)
    "ix_booking_customer_last_name" btree (last_name)
    "ix_booking_customer_phone" btree (phone)
    "trgm_cust_first_name" gist (first_name gist_trgm_ops)
    "trgm_cust_full_name" gist (((first_name::text || ' '::text) || last_name::text) gist_trgm_ops)
    "trgm_cust_last_name" gist (last_name gist_trgm_ops)

我正在运行以下查询:

EXPLAIN ANALYZE 
SELECT bk.booking_id, bk.created_at, bk.checkin_date 
FROM booking bk 
WHERE bk.reference_number = '9123889123' OR 
    EXISTS (
        SELECT 1 FROM booking_customer cust 
        WHERE cust.booking_id = bk.booking_id AND (
            cust.email = '9123889123' OR 
            cust.phone = '9123889123'
        ) AND 
        cust.deleted = false
    )
ORDER BY bk.checkin_date DESC, bk.created_at DESC 
LIMIT 10 OFFSET 0;

这将导致以下查询计划:

    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..365.54 rows=10 width=31) (actual time=57.861..865.883 rows=3 loops=1)
   ->  Index Scan Backward using idx_booking_sort_checkin on booking bk  (cost=0.42..14419601.66 rows=394937 width=31) (actual time=57.858..865.877 rows=3 loops=1)
         Filter: (((reference_number)::text = '9916092871'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
         Rows Removed by Filter: 676681
         SubPlan 1
           ->  Bitmap Heap Scan on booking_customer cust  (cost=14.08..18.10 rows=1 width=0) (never executed)
                 Recheck Cond: (((booking_id)::text = (bk.booking_id)::text) AND (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text)))
                 Filter: (NOT deleted)
                 ->  BitmapAnd  (cost=14.08..14.08 rows=1 width=0) (never executed)
                       ->  Bitmap Index Scan on book_cust_idx  (cost=0.00..4.49 rows=8 width=0) (never executed)
                             Index Cond: ((booking_id)::text = (bk.booking_id)::text)
                       ->  BitmapOr  (cost=9.34..9.34 rows=65 width=0) (never executed)
                             ->  Bitmap Index Scan on ix_booking_customer_email  (cost=0.00..4.67 rows=33 width=0) (never executed)
                                   Index Cond: ((email)::text = '9916092871'::text)
                             ->  Bitmap Index Scan on ix_booking_customer_phone  (cost=0.00..4.67 rows=32 width=0) (never executed)
                                   Index Cond: ((phone)::text = '9916092871'::text)
         SubPlan 2
           ->  Bitmap Heap Scan on booking_customer cust_1  (cost=9.38..264.83 rows=65 width=32) (actual time=0.047..0.050 rows=3 loops=1)
                 Recheck Cond: (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text))
                 Filter: (NOT deleted)
                 Heap Blocks: exact=3
                 ->  BitmapOr  (cost=9.38..9.38 rows=65 width=0) (actual time=0.042..0.042 rows=0 loops=1)
                       ->  Bitmap Index Scan on ix_booking_customer_email  (cost=0.00..4.67 rows=33 width=0) (actual time=0.019..0.019 rows=0 loops=1)
                             Index Cond: ((email)::text = '9916092871'::text)
                       ->  Bitmap Index Scan on ix_booking_customer_phone  (cost=0.00..4.67 rows=32 width=0) (actual time=0.023..0.023 rows=3 loops=1)
                             Index Cond: ((phone)::text = '9916092871'::text)
 Planning time: 0.782 ms
 Execution time: 865.956 ms
(28 rows)

如果看到的话,postgres在Filterreference_number字段上使用了booking_id谓词,我对此进行了索引。

但是,如果我从OR子句中删除了WHERE条件,它将开始使用索引:

对于此查询:

EXPLAIN ANALYZE 
SELECT bk.booking_id, bk.created_at, bk.checkin_date 
FROM booking bk 
WHERE bk.reference_number = '9123889123'
ORDER BY bk.checkin_date DESC, bk.created_at DESC 
LIMIT 10 OFFSET 0;

查询计划是这样的:

                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.45..8.46 rows=1 width=31) (actual time=0.021..0.021 rows=0 loops=1)
   ->  Sort  (cost=8.45..8.46 rows=1 width=31) (actual time=0.020..0.020 rows=0 loops=1)
         Sort Key: checkin_date DESC, created_at DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Index Scan using ix_booking_reference_number on booking bk  (cost=0.42..8.44 rows=1 width=31) (actual time=0.014..0.014 rows=0 loops=1)
               Index Cond: ((reference_number)::text = '9123889123'::text)
 Planning time: 0.334 ms
 Execution time: 0.042 ms

我不明白为什么这种行为会改变? reference_numberbooking_id具有唯一索引。另外,第一个查询中的这两个子计划是什么?还会影响查询性能吗?

我在gist上创建了reference_number索引,以允许我在其他地方使用的LIKE查询索引。

是否可以更改某些内容以提高查询性能?

我在预订表中有50万条记录,在客户表中有200万条记录。

1 个答案:

答案 0 :(得分:2)

查询是完全不同的,因此它们执行不同也就不足为奇了。

对于第一个查询,如果您告诉PostgreSQL不要使用索引idx_booking_sort_checkin,则可能会更快:

ORDER BY bk.checkin_date DESC, (bk.created_at + INTERVAL '0 days') DESC

问题是PostgreSQL认为通过使用索引按booking顺序扫描ORDER BY直到找到足够的符合条件的行,PostgreSQL认为它是最快的。但是它不知道子查询将返回哪些值,因此不能确定它将很快找到10行。

实际上,这是完全错误的,因为根本只有三个匹配的行,因此必须以这种方式扫描整个表。

使用与索引不匹配的ORDER BY子句将阻止PostgreSQL使用此策略。