我目前有一个postgresql查询,因为OR语句很慢。它显然没有使用索引。到目前为止,重写此查询失败了。
查询:
EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN advertcategory a1_
ON a0_.advert_category_id = a1_.id
WHERE a0_.advert_category_id IN ( 1136 )
OR a1_.parent_id IN ( 1136 )
ORDER BY a0_.created_date DESC
LIMIT 15;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..27542.49 rows=15 width=12) (actual time=1.658..50.809 rows=15 loops=1)
-> Nested Loop (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.657..50.790 rows=15 loops=1)
-> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.013..16.449 rows=12405 loops=1)
-> Index Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405)
Index Cond: (id = a0_.advert_category_id)
Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))
Rows Removed by Filter: 1
Total runtime: 50.860 ms
缓慢的原因:Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))
我尝试过使用INNER JOIN而不是WHERE语句:
EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN advertcategory a1_
ON a0_.advert_category_id = a1_.id
AND ( a0_.advert_category_id IN ( 1136 )
OR a1_.parent_id IN ( 1136 ) )
ORDER BY a0_.created_date DESC
LIMIT 15;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..27542.49 rows=15 width=12) (actual time=4.667..139.955 rows=15 loops=1)
-> Nested Loop (cost=0.00..1691109.07 rows=921 width=12) (actual time=4.666..139.932 rows=15 loops=1)
-> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.019..100.765 rows=12405 loops=1)
-> Index Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405)
Index Cond: (id = a0_.advert_category_id)
Filter: ((a0_.advert_category_id = 1136) OR (parent_id = 1136))
Rows Removed by Filter: 1
Total runtime: 140.048 ms
当我删除其中一个OR条件时,查询会加快速度。所以我做了一个UNION来看结果。它非常快!但我不认为这是一个解决方案:
EXPLAIN ANALYZE
(SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN advertcategory a1_
ON a0_.advert_category_id = a1_.id
WHERE a0_.advert_category_id IN ( 1136 )
ORDER BY a0_.created_date DESC
LIMIT 15)
UNION
(SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN advertcategory a1_
ON a0_.advert_category_id = a1_.id
WHERE a1_.parent_id IN ( 1136 )
ORDER BY a0_.created_date DESC
LIMIT 15);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=4125.70..4126.00 rows=30 width=12) (actual time=7.945..7.951 rows=15 loops=1)
-> Append (cost=1120.82..4125.63 rows=30 width=12) (actual time=6.811..7.929 rows=15 loops=1)
-> Subquery Scan on "*SELECT* 1" (cost=1120.82..1121.01 rows=15 width=12) (actual time=6.810..6.840 rows=15 loops=1)
-> Limit (cost=1120.82..1120.86 rows=15 width=12) (actual time=6.809..6.825 rows=15 loops=1)
-> Sort (cost=1120.82..1121.56 rows=295 width=12) (actual time=6.807..6.813 rows=15 loops=1)
Sort Key: a0_.created_date
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=10.59..1113.59 rows=295 width=12) (actual time=1.151..6.639 rows=220 loops=1)
-> Index Only Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..8.27 rows=1 width=4) (actual time=1.030..1.033 rows=1 loops=1)
Index Cond: (id = 1136)
Heap Fetches: 1
-> Bitmap Heap Scan on advert a0_ (cost=10.59..1102.37 rows=295 width=16) (actual time=0.099..5.287 rows=220 loops=1)
Recheck Cond: (advert_category_id = 1136)
-> Bitmap Index Scan on idx_54f1f40bd4436821 (cost=0.00..10.51 rows=295 width=0) (actual time=0.073..0.073 rows=220 loops=1)
Index Cond: (advert_category_id = 1136)
-> Subquery Scan on "*SELECT* 2" (cost=3004.43..3004.62 rows=15 width=12) (actual time=1.072..1.072 rows=0 loops=1)
-> Limit (cost=3004.43..3004.47 rows=15 width=12) (actual time=1.071..1.071 rows=0 loops=1)
-> Sort (cost=3004.43..3005.99 rows=626 width=12) (actual time=1.069..1.069 rows=0 loops=1)
Sort Key: a0_.created_date
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=22.91..2989.07 rows=626 width=12) (actual time=1.056..1.056 rows=0 loops=1)
-> Index Scan using idx_d84ab8ea727aca70 on advertcategory a1_ (cost=0.00..8.27 rows=1 width=4) (actual time=1.054..1.054 rows=0 loops=1)
Index Cond: (parent_id = 1136)
-> Bitmap Heap Scan on advert a0_ (cost=22.91..2972.27 rows=853 width=16) (never executed)
Recheck Cond: (advert_category_id = a1_.id)
-> Bitmap Index Scan on idx_54f1f40bd4436821 (cost=0.00..22.70 rows=853 width=0) (never executed)
Index Cond: (advert_category_id = a1_.id)
Total runtime: 8.940 ms
(28 rows)
尝试撤销IN声明:
EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN advertcategory a1_
ON a0_.advert_category_id = a1_.id
WHERE 1136 IN ( a0_.advert_category_id, a1_.parent_id )
ORDER BY a0_.created_date DESC
LIMIT 15;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..27542.49 rows=15 width=12) (actual time=1.848..62.461 rows=15 loops=1)
-> Nested Loop (cost=0.00..1691109.07 rows=921 width=12) (actual time=1.847..62.441 rows=15 loops=1)
-> Index Scan Backward using advert_created_date_idx on advert a0_ (cost=0.00..670300.17 rows=353804 width=16) (actual time=0.028..27.316 rows=12405 loops=1)
-> Index Scan using advertcategory_pkey on advertcategory a1_ (cost=0.00..2.88 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=12405)
Index Cond: (id = a0_.advert_category_id)
Filter: ((1136 = a0_.advert_category_id) OR (1136 = parent_id))
Rows Removed by Filter: 1
Total runtime: 62.506 ms
(8 rows)
尝试使用EXISTS:
EXPLAIN ANALYZE SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN advertcategory a1_
ON a0_.advert_category_id = a1_.id
WHERE EXISTS(SELECT test.id
FROM advert test
INNER JOIN advertcategory test_cat
ON test_cat.id = test.advert_category_id
WHERE test.id = a0_.id
AND ( test.advert_category_id IN ( 1136 )
OR test_cat.parent_id IN ( 1136 ) ))
ORDER BY a0_.created_date DESC
LIMIT 15;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=45538.18..45538.22 rows=15 width=12) (actual time=524.654..524.673 rows=15 loops=1)
-> Sort (cost=45538.18..45540.48 rows=921 width=12) (actual time=524.651..524.658 rows=15 loops=1)
Sort Key: a0_.created_date
Sort Method: top-N heapsort Memory: 25kB
-> Hash Join (cost=39803.59..45515.58 rows=921 width=12) (actual time=497.362..524.436 rows=220 loops=1)
Hash Cond: (a0_.advert_category_id = a1_.id)
-> Nested Loop (cost=39786.88..45486.21 rows=921 width=16) (actual time=496.748..523.501 rows=220 loops=1)
-> HashAggregate (cost=39786.88..39796.09 rows=921 width=4) (actual time=496.705..496.872 rows=220 loops=1)
-> Hash Join (cost=16.71..39784.58 rows=921 width=4) (actual time=1.210..496.294 rows=220 loops=1)
Hash Cond: (test.advert_category_id = test_cat.id)
Join Filter: ((test.advert_category_id = 1136) OR (test_cat.parent_id = 1136))
Rows Removed by Join Filter: 353584
-> Seq Scan on advert test (cost=0.00..33134.04 rows=353804 width=8) (actual time=0.002..177.953 rows=353804 loops=1)
-> Hash (cost=9.65..9.65 rows=565 width=8) (actual time=0.622..0.622 rows=565 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 22kB
-> Seq Scan on advertcategory test_cat (cost=0.00..9.65 rows=565 width=8) (actual time=0.005..0.327 rows=565 loops=1)
-> Index Scan using advert_pkey on advert a0_ (cost=0.00..6.17 rows=1 width=16) (actual time=0.117..0.118 rows=1 loops=220)
Index Cond: (id = test.id)
-> Hash (cost=9.65..9.65 rows=565 width=4) (actual time=0.604..0.604 rows=565 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 20kB
-> Seq Scan on advertcategory a1_ (cost=0.00..9.65 rows=565 width=4) (actual time=0.010..0.285 rows=565 loops=1)
Total runtime: 524.797 ms
广告表(已剥离):
353804 rows
Table "public.advert"
Column | Type | Modifiers | Storage | Stats target | Description
-----------------------------+--------------------------------+-----------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('advert_id_seq'::regclass) | plain | |
advert_category_id | integer | not null | plain | |
Indexes:
"idx_54f1f40bd4436821" btree (advert_category_id)
"advert_created_date_idx" btree (created_date)
Foreign-key constraints:
"fk_54f1f40bd4436821" FOREIGN KEY (advert_category_id) REFERENCES advertcategory(id) ON DELETE RESTRICT
Has OIDs: no
类别表(剥离):
565 rows
Table "public.advertcategory"
Column | Type | Modifiers
-----------+---------+-------------------------------------------------------------
id | integer | not null default nextval('advertcategory_id_seq'::regclass)
parent_id | integer |
active | boolean | not null
system | boolean | not null
Indexes:
"advertcategory_pkey" PRIMARY KEY, btree (id)
"idx_d84ab8ea727aca70" btree (parent_id)
Foreign-key constraints:
"fk_d84ab8ea727aca70" FOREIGN KEY (parent_id) REFERENCES advertcategory(id) ON DELETE RESTRICT
短服务器配置:
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
name | current_setting | source
----------------------------+--------------------+----------------------
shared_buffers | 1800MB | configuration file
work_mem | 4MB | configuration file
正如您所看到的,没有一个合适的解决方案可以提高速度。只有UNION解决方案才能拆分OR语句,从而提高了性能。但是我不能使用它,因为这个查询正在通过我的ORM框架使用,还有很多其他的过滤器选项。如果我能做到这一点,为什么优化器不这样做呢?这似乎是一个非常简单的优化。
有关此的任何提示吗?这个小问题的解决方案将非常感谢!
答案 0 :(得分:4)
全新方法。您的where
条件位于两个表中,但这似乎没必要。
第一个改变是:
where a1_.id = 1136 or a1_.parent_id = 1136
我认为您想要的结构是扫描类别表,然后从广告表中提取。为了提供帮助,您可以在advert(advert_category_id, created_date)
上创建索引。
我很想通过将where
子句移动到子查询中来编写查询。我不知道这是否会影响性能:
SELECT a0_.id AS id0
FROM advert a0_ INNER JOIN
(select ac.*
from advertcategory ac
where ac.id = 1136 or ac.parent_id = 1136
) ac
ON a0_.advert_category_id = ac.id
ORDER BY a0_.created_date DESC
LIMIT 15;
答案 1 :(得分:3)
“如果我能做到这一点,为什么优化器不这样做呢?” - 因为有各种各样的情况,它不一定有效(由于子查询中的聚合)或有趣(由于更好的索引)这样做。
你可能得到的最好的查询计划是在戈登的答案中给出的,使用union all
而不是union
来避免排序(我认为某个类别永远不是它自己的父类,消除任何重复的可能性。)
否则,请注意您的查询可以像这样重写:
SELECT a0_.id AS id0
FROM advert a0_
INNER JOIN advertcategory a1_
ON a0_.advert_category_id = a1_.id
WHERE a1_.id IN ( 1136 )
OR a1_.parent_id IN ( 1136 )
ORDER BY a0_.created_date DESC
LIMIT 15;
换句话说,您根据一个表中的条件进行过滤,并根据另一个表进行排序/限制。你编写它的方式使你不能使用一个好的索引,因为规划器没有意识到过滤条件都来自同一个表,所以它会在created_date上嵌套,并带有你当前正在做的过滤器。这是一个不错的计划,请注意......如果是这样的话,这实际上是正确的。 1136不是一个非常有选择性的标准。
通过明确表示第二个表是感兴趣的表,当类别具有足够的选择性时,如果您在advertcategory (id)
(如果已经拥有,则已经拥有索引)时,您可能最终得到位图堆扫描它是主键)和advertcategory (parent_id)
(你现在可能没有)。但是,不要过分依赖它 - 就我所知,PG不会收集相关的列信息。
另一种可能性是直接在广告中维护一个具有聚合类别(使用触发器)的数组,并在其上使用GIST索引:
SELECT a0_.id AS id0
FROM advert a0_
WHERE ARRAY[1136, 1137] && a0_.category_ids -- 1136 OR 1137; use <@ for AND
ORDER BY a0_.created_date DESC
LIMIT 15;
这在技术上是多余的,但是它可以很好地优化这种查询(即在嵌套类别树上过滤产生复杂的连接标准)...当PG决定使用它时,你将最终排在前面适用的广告。 (在较旧的PG版本中,由于缺乏统计数据,&amp;&amp;的选择性是任意的;我依稀记得阅读更改日志,其中9.1,9.2或9.3改进了东西,可能是通过使用类似于tsvector内容统计信息收集器使用的代码通用数组类型。无论如何,请务必使用最新的PG版本,并确保不使用gin/gist
索引将无法使用的运算符重写该查询。)