Question

我很难理解为什么此查询要花费超过1毫秒的时间。

EXPLAIN ANALYZE SELECT AVG("adverts"."price") 
FROM "adverts" WHERE "adverts"."type" IN ('Businesses::Restaurant') 
AND "adverts"."discarded_at" IS NULL AND "adverts"."visible" = true 
AND ("adverts"."city_location_id" = 56 
     OR "adverts"."city_location_id" IN (SELECT "city_locations"."id" 
                                         FROM "city_locations" 
                                         WHERE "city_locations"."type" IN ('Arrondissement') 
                                        AND "city_locations"."arrondissement_city_id" = 56));

QUERY PLAN

 Aggregate  (cost=6583.49..6583.50 rows=1 width=32) (actual time=21.702..21.702 rows=1 loops=1)
   ->  Seq Scan on adverts  (cost=6.31..6533.88 rows=19842 width=4) (actual time=0.462..21.684 rows=44 loops=1)
         Filter: ((discarded_at IS NULL) AND visible AND ((type)::text = 'Businesses::Restaurant'::text) AND ((city_location_id = 56) OR (hashed SubPlan 1)))
         Rows Removed by Filter: 46217
         SubPlan 1
           ->  Index Scan using index_city_locations_on_arrondissement_city_id on city_locations  (cost=0.29..6.31 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1)
                 Index Cond: (arrondissement_city_id = 56)
                 Filter: ((type)::text = 'Arrondissement'::text)
 Planning Time: 0.173 ms
 Execution Time: 21.739 ms

执行时间为 21ms

如果我执行子请求，我将得到：

EXPLAIN ANALYZE SELECT "city_locations"."id" FROM "city_locations" WHERE "city_locations"."type" IN ('Arrondissement') AND "city_locations"."arrondissement_city_id" = 56;
 id 
----
(0 rows)

QUERY PLAN

 Index Scan using index_city_locations_on_arrondissement_city_id on city_locations  (cost=0.29..6.31 rows=1 width=8) (actual time=0.028..0.028 rows=0 loops=1)
   Index Cond: (arrondissement_city_id = 56)
   Filter: ((type)::text = 'Arrondissement'::text)
 Planning Time: 0.233 ms
 Execution Time: 0.075 ms

执行时间为： 0.075ms ，超快，结果为 NULL 。

当我将子请求替换为其结果 NULL 时，它非常快。

EXPLAIN ANALYZE SELECT AVG("adverts"."price") 
FROM "adverts" WHERE "adverts"."type" IN ('Businesses::Restaurant') 
AND "adverts"."discarded_at" IS NULL AND "adverts"."visible" = true 
AND ("adverts"."city_location_id" = 56 
     OR "adverts"."city_location_id" IN (NULL));

QUERY PLAN

 Aggregate  (cost=162.66..162.67 rows=1 width=32) (actual time=0.309..0.310 rows=1 loops=1)
   ->  Bitmap Heap Scan on adverts  (cost=4.72..162.55 rows=42 width=4) (actual time=0.082..0.278 rows=44 loops=1)
         Recheck Cond: (city_location_id = 56)
         Filter: ((discarded_at IS NULL) AND visible AND ((type)::text = 'Businesses::Restaurant'::text))
         Heap Blocks: exact=42
         ->  Bitmap Index Scan on index_adverts_on_city_location_id_and_visible  (cost=0.00..4.71 rows=42 width=0) (actual time=0.043..0.044 rows=44 loops=1)
               Index Cond: ((city_location_id = 56) AND (visible = true))
 Planning Time: 0.395 ms
 Execution Time: 0.412 ms

执行时间为 0.412毫秒

我的问题是，当第一个请求的单独请求很快时，为什么它会变慢？

由于 WHERE IN 子句，我会错过一些优化吗？

Answer 1

首先：简化

EXPLAIN ANALYZE 
SELECT AVG(ad.price) 
FROM adverts a
WHERE a.type IN ('Businesses::Restaurant') 
AND a.discarded_at IS NULL
AND a.visible = true 
AND (a.city_location_id = 56 
     OR a.city_location_id IN (
        SELECT c.id
        FROM city_locations c
        WHERE c.type IN ('Arrondissement')                                  
        AND c.arrondissement_city_id = 56))
        ;

下一步：将IN(...)重写为EXISTS(...)

EXPLAIN ANALYZE
SELECT AVG(a.price)
FROM adverts a
WHERE a.type IN ('Businesses::Restaurant') 
AND a.discarded_at IS NULL
AND a.visible = true 
AND (a.city_location_id = 56 
     OR EXISTS(
        SELECT *
        FROM city_locations c
        WHERE a.city_location_id = c.id 
        AND c.type IN ('Arrondissement')
        AND c.arrondissement_city_id = 56))
        ;

现在，您可以将难看的OR推入子查询：（假设子查询的基数较低）

->优化器可能不够聪明，无法压倒OR这个词

EXPLAIN ANALYZE
SELECT AVG(a.price)  
FROM adverts a
WHERE a.type IN ('Businesses::Restaurant') 
AND a.discarded_at IS NULL
AND a.visible = true 
AND EXISTS(
        SELECT *
        FROM city_locations c
        WHERE a.city_location_id = c.id
        AND (c.type IN ('Arrondissement') AND c.arrondissement_city_id = 56
            OR c.city_location_id = 56
                )
        ;

如果子查询的结果集足够小，则可以尝试将其移至CTE。

Answer 2

在您的方案中似乎正在发生的事情是，对父结果集中的每一行执行一次子查询。因此，将子查询的运行时间乘以父级结果集中的记录数即可。

重新编写此查询以加快查询速度的一种方法是在查询开始时使用WITH子句：

WITH cities AS (
    SELECT "city_locations"."id" AS id
    FROM "city_locations" 
    WHERE "city_locations"."type" IN ('Arrondissement') 
        AND "city_locations"."arrondissement_city_id" = 56
)
SELECT AVG("adverts"."price") 
FROM "adverts" 
WHERE "adverts"."type" IN ('Businesses::Restaurant') 
    AND "adverts"."discarded_at" IS NULL AND "adverts"."visible" = true 
    AND ("adverts"."city_location_id" = 56 
    OR "adverts"."city_location_id" IN (SELECT id FROM cities));

在没有记录或记录很少的情况下，这可能会加快速度，但是它仍在为每个记录执行SELECT。这样只会减少直接访问和过滤city_locations表的可能性。

Answer 3

认为您可能假设IN内部的查询肯定先执行，就好像它是独立的，然后将结果馈送到外部，此时“它应该意识到它为null，不会产生任何结果，因此请尽早退出”-因此，您期望大型查询花费的时间与DB决定小型查询产生NULL的时间相同。实际上，大查询将由db优化器以某种方式重写，以便其执行方式与您脑中形成的概念执行模型不同。这一次，数据库管理员为其选择了次优的优化方法，并且花费了更长的时间来连接数据，然后才意识到结果为空。

这是与SELECT * FROM table WHERE x IN (null)这样的NULL硬编码非常不同的方案-会有一个特定的优化来确定这是一个空操作，甚至可能会发现包含诸如此类的始终为假的操作根本不会执行。如果您的好奇心足够强烈，MySQL手册会介绍一些有关查询优化的奇妙细节：）

为什么此查询的速度慢，其中的id在（返回空值的子选择）中

3 个答案: