如何优化此JOIN查询?

时间:2018-07-06 16:59:58

标签: sql postgresql query-performance

pg_stat_statements起,我得到的查询平均需要900毫秒。在优化此查询的过程中,推荐的推荐方式是什么?我确实有索引,但不确定瓶颈可能在哪里。这是EXPLAIN ANALYZE

EXPLAIN ANALYZE 
SELECT "listing_variants".* 
FROM "listing_variants" 
  INNER JOIN "links" ON "links"."listing_variant_id" = "listing_variants"."id" 
  INNER JOIN "product_variants" ON "product_variants"."id" = "links"."product_variant_id" 
  INNER JOIN "listings" ON "listing_variants"."listing_id" = "listings"."id" 
WHERE "listings"."sales_channel_id" = 31 
  AND "listing_variants"."is_linked" = 'f' 
  AND (listing_variants.available_quantity != product_variants.available_quantity);

给予

    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=5283.71..6960.01 rows=524 width=232) (actual time=54.138..54.138 rows=0 loops=1)
   Join Filter: (listing_variants.available_quantity <> product_variants.available_quantity)
   ->  Hash Join  (cost=5283.42..6648.69 rows=720 width=236) (actual time=54.137..54.137 rows=0 loops=1)
         Hash Cond: (links.listing_variant_id = listing_variants.id)
         ->  Index Only Scan using index_on_product_listing_variant_id on links  (cost=0.29..1205.14 rows=30643 width=8) (actual time=0.026..6.112 rows=30863 loops=1)
               Heap Fetches: 6799
         ->  Hash  (cost=5261.53..5261.53 rows=1728 width=232) (actual time=45.407..45.407 rows=368 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 65kB
               ->  Hash Join  (cost=1671.82..5261.53 rows=1728 width=232) (actual time=11.147..45.075 rows=368 loops=1)
                     Hash Cond: (listing_variants.listing_id = listings.id)
                     ->  Seq Scan on listing_variants  (cost=0.00..3412.77 rows=42577 width=232) (actual time=0.018..29.882 rows=42713 loops=1)
                           Filter: (NOT is_linked)
                           Rows Removed by Filter: 30863
                     ->  Hash  (cost=1661.68..1661.68 rows=811 width=4) (actual time=10.585..10.585 rows=811 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 29kB
                           ->  Bitmap Heap Scan on listings  (cost=30.57..1661.68 rows=811 width=4) (actual time=0.362..10.224 rows=811 loops=1)
                                 Recheck Cond: (sales_channel_id = 31)
                                 Heap Blocks: exact=668
                                 ->  Bitmap Index Scan on index_listings_on_sales_channel_ext_svc_updated  (cost=0.00..30.37 rows=811 width=0) (actual time=0.242..0.242 rows=821 loops=1)
                                       Index Cond: (sales_channel_id = 31)
   ->  Index Scan using product_variants_pkey on product_variants  (cost=0.29..0.42 rows=1 width=12) (never executed)
         Index Cond: (id = links.product_variant_id)
 Planning time: 1.437 ms
 Execution time: 54.366 ms

谢谢!

1 个答案:

答案 0 :(得分:0)

仅当需要从多个表中选择数据时才使用JOIN Over Exists,而在此不做。这是优化的第一步。对于您而言,使用join会根据连接的辅助表中可用的多个数据返回大量相同的数据行,从而污染您的结果集。

SELECT "listing_variants".* 
FROM "listing_variants" 
  WHERE "listing_variants"."is_linked" = 'f' 
    AND EXISTS(SELECT 1 FROM "links" ON "links"."listing_variant_id" = "listing_variants"."id" 
   JOIN "product_variants" ON "product_variants"."id" = "links"."product_variant_id" 
    AND "listing_variants"."available_quantity" != "product_variants"."available_quantity"
   JOIN "listings" ON "listing_variants"."listing_id" = "listings"."id"
    AND "listings"."sales_channel_id" = 31);

除了查询非常简单外,良好的索引编制和数据分区也只能有助于进一步优化。