为什么此查询的速度慢,其中的id在(返回空值的子选择)中

时间:2019-05-30 10:13:35

标签: sql postgresql

我很难理解为什么此查询要花费超过1毫秒的时间。

EXPLAIN ANALYZE SELECT AVG("adverts"."price") 
FROM "adverts" WHERE "adverts"."type" IN ('Businesses::Restaurant') 
AND "adverts"."discarded_at" IS NULL AND "adverts"."visible" = true 
AND ("adverts"."city_location_id" = 56 
     OR "adverts"."city_location_id" IN (SELECT "city_locations"."id" 
                                         FROM "city_locations" 
                                         WHERE "city_locations"."type" IN ('Arrondissement') 
                                        AND "city_locations"."arrondissement_city_id" = 56));

QUERY PLAN

 Aggregate  (cost=6583.49..6583.50 rows=1 width=32) (actual time=21.702..21.702 rows=1 loops=1)
   ->  Seq Scan on adverts  (cost=6.31..6533.88 rows=19842 width=4) (actual time=0.462..21.684 rows=44 loops=1)
         Filter: ((discarded_at IS NULL) AND visible AND ((type)::text = 'Businesses::Restaurant'::text) AND ((city_location_id = 56) OR (hashed SubPlan 1)))
         Rows Removed by Filter: 46217
         SubPlan 1
           ->  Index Scan using index_city_locations_on_arrondissement_city_id on city_locations  (cost=0.29..6.31 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1)
                 Index Cond: (arrondissement_city_id = 56)
                 Filter: ((type)::text = 'Arrondissement'::text)
 Planning Time: 0.173 ms
 Execution Time: 21.739 ms

执行时间为 21ms

如果我执行子请求,我将得到:

EXPLAIN ANALYZE SELECT "city_locations"."id" FROM "city_locations" WHERE "city_locations"."type" IN ('Arrondissement') AND "city_locations"."arrondissement_city_id" = 56;
 id 
----
(0 rows)

QUERY PLAN

 Index Scan using index_city_locations_on_arrondissement_city_id on city_locations  (cost=0.29..6.31 rows=1 width=8) (actual time=0.028..0.028 rows=0 loops=1)
   Index Cond: (arrondissement_city_id = 56)
   Filter: ((type)::text = 'Arrondissement'::text)
 Planning Time: 0.233 ms
 Execution Time: 0.075 ms

执行时间为: 0.075ms ,超快,结果为 NULL

当我将子请求替换为其结果 NULL 时,它非常快。

EXPLAIN ANALYZE SELECT AVG("adverts"."price") 
FROM "adverts" WHERE "adverts"."type" IN ('Businesses::Restaurant') 
AND "adverts"."discarded_at" IS NULL AND "adverts"."visible" = true 
AND ("adverts"."city_location_id" = 56 
     OR "adverts"."city_location_id" IN (NULL));

QUERY PLAN

 Aggregate  (cost=162.66..162.67 rows=1 width=32) (actual time=0.309..0.310 rows=1 loops=1)
   ->  Bitmap Heap Scan on adverts  (cost=4.72..162.55 rows=42 width=4) (actual time=0.082..0.278 rows=44 loops=1)
         Recheck Cond: (city_location_id = 56)
         Filter: ((discarded_at IS NULL) AND visible AND ((type)::text = 'Businesses::Restaurant'::text))
         Heap Blocks: exact=42
         ->  Bitmap Index Scan on index_adverts_on_city_location_id_and_visible  (cost=0.00..4.71 rows=42 width=0) (actual time=0.043..0.044 rows=44 loops=1)
               Index Cond: ((city_location_id = 56) AND (visible = true))
 Planning Time: 0.395 ms
 Execution Time: 0.412 ms

执行时间为 0.412毫秒

我的问题是,当第一个请求的单独请求很快时,为什么它会变慢?

由于 WHERE IN 子句,我会错过一些优化吗?

3 个答案:

答案 0 :(得分:1)

首先:简化


EXPLAIN ANALYZE 
SELECT AVG(ad.price) 
FROM adverts a
WHERE a.type IN ('Businesses::Restaurant') 
AND a.discarded_at IS NULL
AND a.visible = true 
AND (a.city_location_id = 56 
     OR a.city_location_id IN (
        SELECT c.id
        FROM city_locations c
        WHERE c.type IN ('Arrondissement')                                  
        AND c.arrondissement_city_id = 56))
        ;

下一步:将IN(...)重写为EXISTS(...)


EXPLAIN ANALYZE
SELECT AVG(a.price)
FROM adverts a
WHERE a.type IN ('Businesses::Restaurant') 
AND a.discarded_at IS NULL
AND a.visible = true 
AND (a.city_location_id = 56 
     OR EXISTS(
        SELECT *
        FROM city_locations c
        WHERE a.city_location_id = c.id 
        AND c.type IN ('Arrondissement')
        AND c.arrondissement_city_id = 56))
        ;

现在,您可以将难看的OR推入子查询: (假设子查询的基数较低)

->优化器可能 不够聪明,无法压倒OR这个词


EXPLAIN ANALYZE
SELECT AVG(a.price)  
FROM adverts a
WHERE a.type IN ('Businesses::Restaurant') 
AND a.discarded_at IS NULL
AND a.visible = true 
AND EXISTS(
        SELECT *
        FROM city_locations c
        WHERE a.city_location_id = c.id
        AND (c.type IN ('Arrondissement') AND c.arrondissement_city_id = 56
            OR c.city_location_id = 56
                )
        ;

如果子查询的结果集足够小,则可以尝试将其移至CTE。

答案 1 :(得分:0)

在您的方案中似乎正在发生的事情是,对父结果集中的每一行执行一次子查询。因此,将子查询的运行时间乘以父级结果集中的记录数即可。

重新编写此查询以加快查询速度的一种方法是在查询开始时使用WITH子句:

WITH cities AS (
    SELECT "city_locations"."id" AS id
    FROM "city_locations" 
    WHERE "city_locations"."type" IN ('Arrondissement') 
        AND "city_locations"."arrondissement_city_id" = 56
)
SELECT AVG("adverts"."price") 
FROM "adverts" 
WHERE "adverts"."type" IN ('Businesses::Restaurant') 
    AND "adverts"."discarded_at" IS NULL AND "adverts"."visible" = true 
    AND ("adverts"."city_location_id" = 56 
    OR "adverts"."city_location_id" IN (SELECT id FROM cities));

在没有记录或记录很少的情况下,这可能会加快速度,但是它仍在为每个记录执行SELECT。这样只会减少直接访问和过滤city_locations表的可能性。

答案 2 :(得分:0)

认为您可能假设IN内部的查询肯定先执行,就好像它是独立的,然后将结果馈送到外部,此时“它应该意识到它为null,不会产生任何结果,因此请尽早退出”-因此,您期望大型查询花费的时间与DB决定小型查询产生NULL的时间相同。实际上,大查询将由db优化器以某种方式重写,以便其执行方式与您脑中形成的概念执行模型不同。这一次,数据库管理员为其选择了次优的优化方法,并且花费了更长的时间来连接数据,然后才意识到结果为空。

这是与SELECT * FROM table WHERE x IN (null)这样的NULL硬编码非常不同的方案-会有一个特定的优化来确定这是一个空操作,甚至可能会发现包含诸如此类的始终为假的操作根本不会执行。如果您的好奇心足够强烈,MySQL手册会介绍一些有关查询优化的奇妙细节:)