我有一个预订和一个客户表,具有以下架构:
预订表:
Table "public.booking"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
deleted | boolean | | |
booking_id | character varying | | not null |
reference_number | character varying | | |
checkin_date | timestamp with time zone | | |
checkout_date | timestamp with time zone | | |
status | character varying | | |
version | integer | | not null |
comments | text | | |
extra_information | json | | |
cancellation_reason | character varying | | |
cancellation_datetime | timestamp with time zone | | |
created_at | timestamp with time zone | | not null | now()
modified_at | timestamp with time zone | | not null | now()
Indexes:
"booking_pkey" PRIMARY KEY, btree (booking_id)
"ix_booking_reference_number" UNIQUE, btree (reference_number)
"idx_booking_sort_checkin" btree (checkin_date, created_at)
"idx_booking_sort_checkout" btree (checkout_date, created_at)
"idx_booking_stay_dates" btree (checkin_date, checkout_date DESC)
"ix_booking_deleted" btree (deleted)
"ix_booking_status" btree (status)
"trgm_booking_ref_num" gist (reference_number gist_trgm_ops)
客户表:
Table "public.booking_customer"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
deleted | boolean | | |
customer_id | character varying | | not null |
booking_id | character varying | | not null |
first_name | character varying | | |
last_name | character varying | | |
phone | character varying | | |
email | character varying | | |
created_at | timestamp with time zone | | not null | now()
modified_at | timestamp with time zone | | not null | now()
Indexes:
"booking_customer_pkey" PRIMARY KEY, btree (customer_id, booking_id)
"book_cust_idx" btree (booking_id, customer_id)
"idx_booking_customer_full_name" btree (((first_name::text || ' '::text) || last_name::text))
"ix_booking_customer_deleted" btree (deleted)
"ix_booking_customer_email" btree (email)
"ix_booking_customer_first_name" btree (first_name)
"ix_booking_customer_last_name" btree (last_name)
"ix_booking_customer_phone" btree (phone)
"trgm_cust_first_name" gist (first_name gist_trgm_ops)
"trgm_cust_full_name" gist (((first_name::text || ' '::text) || last_name::text) gist_trgm_ops)
"trgm_cust_last_name" gist (last_name gist_trgm_ops)
我正在运行以下查询:
EXPLAIN ANALYZE
SELECT bk.booking_id, bk.created_at, bk.checkin_date
FROM booking bk
WHERE bk.reference_number = '9123889123' OR
EXISTS (
SELECT 1 FROM booking_customer cust
WHERE cust.booking_id = bk.booking_id AND (
cust.email = '9123889123' OR
cust.phone = '9123889123'
) AND
cust.deleted = false
)
ORDER BY bk.checkin_date DESC, bk.created_at DESC
LIMIT 10 OFFSET 0;
这将导致以下查询计划:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..365.54 rows=10 width=31) (actual time=57.861..865.883 rows=3 loops=1)
-> Index Scan Backward using idx_booking_sort_checkin on booking bk (cost=0.42..14419601.66 rows=394937 width=31) (actual time=57.858..865.877 rows=3 loops=1)
Filter: (((reference_number)::text = '9916092871'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
Rows Removed by Filter: 676681
SubPlan 1
-> Bitmap Heap Scan on booking_customer cust (cost=14.08..18.10 rows=1 width=0) (never executed)
Recheck Cond: (((booking_id)::text = (bk.booking_id)::text) AND (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text)))
Filter: (NOT deleted)
-> BitmapAnd (cost=14.08..14.08 rows=1 width=0) (never executed)
-> Bitmap Index Scan on book_cust_idx (cost=0.00..4.49 rows=8 width=0) (never executed)
Index Cond: ((booking_id)::text = (bk.booking_id)::text)
-> BitmapOr (cost=9.34..9.34 rows=65 width=0) (never executed)
-> Bitmap Index Scan on ix_booking_customer_email (cost=0.00..4.67 rows=33 width=0) (never executed)
Index Cond: ((email)::text = '9916092871'::text)
-> Bitmap Index Scan on ix_booking_customer_phone (cost=0.00..4.67 rows=32 width=0) (never executed)
Index Cond: ((phone)::text = '9916092871'::text)
SubPlan 2
-> Bitmap Heap Scan on booking_customer cust_1 (cost=9.38..264.83 rows=65 width=32) (actual time=0.047..0.050 rows=3 loops=1)
Recheck Cond: (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text))
Filter: (NOT deleted)
Heap Blocks: exact=3
-> BitmapOr (cost=9.38..9.38 rows=65 width=0) (actual time=0.042..0.042 rows=0 loops=1)
-> Bitmap Index Scan on ix_booking_customer_email (cost=0.00..4.67 rows=33 width=0) (actual time=0.019..0.019 rows=0 loops=1)
Index Cond: ((email)::text = '9916092871'::text)
-> Bitmap Index Scan on ix_booking_customer_phone (cost=0.00..4.67 rows=32 width=0) (actual time=0.023..0.023 rows=3 loops=1)
Index Cond: ((phone)::text = '9916092871'::text)
Planning time: 0.782 ms
Execution time: 865.956 ms
(28 rows)
如果看到的话,postgres在Filter
和reference_number
字段上使用了booking_id
谓词,我对此进行了索引。
但是,如果我从OR
子句中删除了WHERE
条件,它将开始使用索引:
对于此查询:
EXPLAIN ANALYZE
SELECT bk.booking_id, bk.created_at, bk.checkin_date
FROM booking bk
WHERE bk.reference_number = '9123889123'
ORDER BY bk.checkin_date DESC, bk.created_at DESC
LIMIT 10 OFFSET 0;
查询计划是这样的:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=8.45..8.46 rows=1 width=31) (actual time=0.021..0.021 rows=0 loops=1)
-> Sort (cost=8.45..8.46 rows=1 width=31) (actual time=0.020..0.020 rows=0 loops=1)
Sort Key: checkin_date DESC, created_at DESC
Sort Method: quicksort Memory: 25kB
-> Index Scan using ix_booking_reference_number on booking bk (cost=0.42..8.44 rows=1 width=31) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: ((reference_number)::text = '9123889123'::text)
Planning time: 0.334 ms
Execution time: 0.042 ms
我不明白为什么这种行为会改变? reference_number
和booking_id
具有唯一索引。另外,第一个查询中的这两个子计划是什么?还会影响查询性能吗?
我在gist
上创建了reference_number
索引,以允许我在其他地方使用的LIKE
查询索引。
是否可以更改某些内容以提高查询性能?
我在预订表中有50万条记录,在客户表中有200万条记录。
答案 0 :(得分:2)
查询是完全不同的,因此它们执行不同也就不足为奇了。
对于第一个查询,如果您告诉PostgreSQL不要使用索引idx_booking_sort_checkin
,则可能会更快:
ORDER BY bk.checkin_date DESC, (bk.created_at + INTERVAL '0 days') DESC
问题是PostgreSQL认为通过使用索引按booking
顺序扫描ORDER BY
直到找到足够的符合条件的行,PostgreSQL认为它是最快的。但是它不知道子查询将返回哪些值,因此不能确定它将很快找到10行。
实际上,这是完全错误的,因为根本只有三个匹配的行,因此必须以这种方式扫描整个表。
使用与索引不匹配的ORDER BY
子句将阻止PostgreSQL使用此策略。