我很难理解为什么此查询要花费超过1毫秒的时间。
EXPLAIN ANALYZE SELECT AVG("adverts"."price")
FROM "adverts" WHERE "adverts"."type" IN ('Businesses::Restaurant')
AND "adverts"."discarded_at" IS NULL AND "adverts"."visible" = true
AND ("adverts"."city_location_id" = 56
OR "adverts"."city_location_id" IN (SELECT "city_locations"."id"
FROM "city_locations"
WHERE "city_locations"."type" IN ('Arrondissement')
AND "city_locations"."arrondissement_city_id" = 56));
QUERY PLAN
Aggregate (cost=6583.49..6583.50 rows=1 width=32) (actual time=21.702..21.702 rows=1 loops=1)
-> Seq Scan on adverts (cost=6.31..6533.88 rows=19842 width=4) (actual time=0.462..21.684 rows=44 loops=1)
Filter: ((discarded_at IS NULL) AND visible AND ((type)::text = 'Businesses::Restaurant'::text) AND ((city_location_id = 56) OR (hashed SubPlan 1)))
Rows Removed by Filter: 46217
SubPlan 1
-> Index Scan using index_city_locations_on_arrondissement_city_id on city_locations (cost=0.29..6.31 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (arrondissement_city_id = 56)
Filter: ((type)::text = 'Arrondissement'::text)
Planning Time: 0.173 ms
Execution Time: 21.739 ms
执行时间为 21ms
如果我执行子请求,我将得到:
EXPLAIN ANALYZE SELECT "city_locations"."id" FROM "city_locations" WHERE "city_locations"."type" IN ('Arrondissement') AND "city_locations"."arrondissement_city_id" = 56;
id
----
(0 rows)
QUERY PLAN
Index Scan using index_city_locations_on_arrondissement_city_id on city_locations (cost=0.29..6.31 rows=1 width=8) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: (arrondissement_city_id = 56)
Filter: ((type)::text = 'Arrondissement'::text)
Planning Time: 0.233 ms
Execution Time: 0.075 ms
执行时间为: 0.075ms ,超快,结果为 NULL 。
当我将子请求替换为其结果 NULL 时,它非常快。
EXPLAIN ANALYZE SELECT AVG("adverts"."price")
FROM "adverts" WHERE "adverts"."type" IN ('Businesses::Restaurant')
AND "adverts"."discarded_at" IS NULL AND "adverts"."visible" = true
AND ("adverts"."city_location_id" = 56
OR "adverts"."city_location_id" IN (NULL));
QUERY PLAN
Aggregate (cost=162.66..162.67 rows=1 width=32) (actual time=0.309..0.310 rows=1 loops=1)
-> Bitmap Heap Scan on adverts (cost=4.72..162.55 rows=42 width=4) (actual time=0.082..0.278 rows=44 loops=1)
Recheck Cond: (city_location_id = 56)
Filter: ((discarded_at IS NULL) AND visible AND ((type)::text = 'Businesses::Restaurant'::text))
Heap Blocks: exact=42
-> Bitmap Index Scan on index_adverts_on_city_location_id_and_visible (cost=0.00..4.71 rows=42 width=0) (actual time=0.043..0.044 rows=44 loops=1)
Index Cond: ((city_location_id = 56) AND (visible = true))
Planning Time: 0.395 ms
Execution Time: 0.412 ms
执行时间为 0.412毫秒
我的问题是,当第一个请求的单独请求很快时,为什么它会变慢?
由于 WHERE IN 子句,我会错过一些优化吗?
答案 0 :(得分:1)
首先:简化
EXPLAIN ANALYZE
SELECT AVG(ad.price)
FROM adverts a
WHERE a.type IN ('Businesses::Restaurant')
AND a.discarded_at IS NULL
AND a.visible = true
AND (a.city_location_id = 56
OR a.city_location_id IN (
SELECT c.id
FROM city_locations c
WHERE c.type IN ('Arrondissement')
AND c.arrondissement_city_id = 56))
;
下一步:将IN(...)
重写为EXISTS(...)
EXPLAIN ANALYZE
SELECT AVG(a.price)
FROM adverts a
WHERE a.type IN ('Businesses::Restaurant')
AND a.discarded_at IS NULL
AND a.visible = true
AND (a.city_location_id = 56
OR EXISTS(
SELECT *
FROM city_locations c
WHERE a.city_location_id = c.id
AND c.type IN ('Arrondissement')
AND c.arrondissement_city_id = 56))
;
现在,您可以将难看的OR
推入子查询:
(假设子查询的基数较低)
->优化器可能 不够聪明,无法压倒OR
这个词
EXPLAIN ANALYZE
SELECT AVG(a.price)
FROM adverts a
WHERE a.type IN ('Businesses::Restaurant')
AND a.discarded_at IS NULL
AND a.visible = true
AND EXISTS(
SELECT *
FROM city_locations c
WHERE a.city_location_id = c.id
AND (c.type IN ('Arrondissement') AND c.arrondissement_city_id = 56
OR c.city_location_id = 56
)
;
如果子查询的结果集足够小,则可以尝试将其移至CTE。
答案 1 :(得分:0)
在您的方案中似乎正在发生的事情是,对父结果集中的每一行执行一次子查询。因此,将子查询的运行时间乘以父级结果集中的记录数即可。
重新编写此查询以加快查询速度的一种方法是在查询开始时使用WITH
子句:
WITH cities AS (
SELECT "city_locations"."id" AS id
FROM "city_locations"
WHERE "city_locations"."type" IN ('Arrondissement')
AND "city_locations"."arrondissement_city_id" = 56
)
SELECT AVG("adverts"."price")
FROM "adverts"
WHERE "adverts"."type" IN ('Businesses::Restaurant')
AND "adverts"."discarded_at" IS NULL AND "adverts"."visible" = true
AND ("adverts"."city_location_id" = 56
OR "adverts"."city_location_id" IN (SELECT id FROM cities));
在没有记录或记录很少的情况下,这可能会加快速度,但是它仍在为每个记录执行SELECT
。这样只会减少直接访问和过滤city_locations
表的可能性。
答案 2 :(得分:0)
认为您可能假设IN内部的查询肯定先执行,就好像它是独立的,然后将结果馈送到外部,此时“它应该意识到它为null,不会产生任何结果,因此请尽早退出”-因此,您期望大型查询花费的时间与DB决定小型查询产生NULL的时间相同。实际上,大查询将由db优化器以某种方式重写,以便其执行方式与您脑中形成的概念执行模型不同。这一次,数据库管理员为其选择了次优的优化方法,并且花费了更长的时间来连接数据,然后才意识到结果为空。
这是与SELECT * FROM table WHERE x IN (null)
这样的NULL硬编码非常不同的方案-会有一个特定的优化来确定这是一个空操作,甚至可能会发现包含诸如此类的始终为假的操作根本不会执行。如果您的好奇心足够强烈,MySQL手册会介绍一些有关查询优化的奇妙细节:)