Question

我有以下表结构：

AdPerformance
   id
   ad_id
   impressions

Targeting
  value


AdActions
   app_starts

Ad
  id
  name
  parent_id

AdTargeting
  id
  targeting_
  ad_id

Targeting
  id
  name
  value

AdProduct
  id
  ad_id
  name

我需要通过限制产品名称来聚合数据，因此我编写了以下查询：

 SELECT ad_performance.ad_id, targeting.value AS targeting_value, 
     sum(impressions) AS impressions, 
     sum(app_starts) AS app_starts
 FROM ad_performance
     LEFT JOIN ad on ad.id = ad_performance.ad_id
     LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
     RIGHT JOIN (
        SELECT ad_id, value from targeting, ad_targeting 
        WHERE targeting.id = ad_targeting.id and targeting.name = 'gender' 
     ) targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN 
       (SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value

但是，ANALYZE命令中的上述查询大约需要5个记录。

有没有办法改善它？

我的外键上有索引

已更新

参见ANALYZE的输出

                                                                                                                                                                                                          QUERY PLAN                                                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=5787.28..5789.87 rows=259 width=254) (actual time=3283.763..3286.015 rows=5998 loops=1)
   Group Key: adobject_performance.ad_id, targeting.value
   Buffers: shared hit=3400223
   ->  Nested Loop Left Join  (cost=241.63..5603.63 rows=8162 width=254) (actual time=10.438..2774.664 rows=839720 loops=1)
         Buffers: shared hit=3400223
         ->  Nested Loop  (cost=241.21..1590.52 rows=8162 width=250) (actual time=10.412..703.818 rows=839720 loops=1)
               Join Filter: (adobject.id = adobject_performance.ad_id)
               Buffers: shared hit=36755
               ->  Hash Join  (cost=240.78..323.35 rows=9 width=226) (actual time=10.380..20.332 rows=5998 loops=1)
                     Hash Cond: (ad_product.ad_id = ad.id)
                     Buffers: shared hit=190
                     ->  HashAggregate  (cost=128.98..188.96 rows=5998 width=4) (actual time=3.788..6.821 rows=5998 loops=1)
                           Group Key: ad_product.ad_id
                           Buffers: shared hit=39
                           ->  Seq Scan on ad_product  (cost=0.00..113.99 rows=5998 width=4) (actual time=0.011..1.726 rows=5998 loops=1)
                                 Filter: ((product)::text = 'ft2_iPhone'::text)
                                 Rows Removed by Filter: 1
                                 Buffers: shared hit=39
                     ->  Hash  (cost=111.69..111.69 rows=9 width=222) (actual time=6.578..6.578 rows=5998 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 241kB
                           Buffers: shared hit=151
                           ->  Hash Join  (cost=30.26..111.69 rows=9 width=222) (actual time=0.154..4.660 rows=5998 loops=1)
                                 Hash Cond: (adobject.parent_id = adobject_targeting.ad_id)
                                 Buffers: shared hit=151
                                 ->  Seq Scan on adobject  (cost=0.00..77.97 rows=897 width=8) (actual time=0.009..1.449 rows=6001 loops=1)
                                       Buffers: shared hit=69
                                 ->  Hash  (cost=30.24..30.24 rows=2 width=222) (actual time=0.132..0.132 rows=2 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 1kB
                                       Buffers: shared hit=82
                                       ->  Nested Loop  (cost=0.15..30.24 rows=2 width=222) (actual time=0.101..0.129 rows=2 loops=1)
                                             Buffers: shared hit=82
                                             ->  Seq Scan on targeting  (cost=0.00..13.88 rows=2 width=222) (actual time=0.015..0.042 rows=79 loops=1)
                                                   Filter: (name = 'age group'::targeting_name)
                                                   Rows Removed by Filter: 82
                                                   Buffers: shared hit=1
                                             ->  Index Scan using advertising_targeting_pkey on adobject_targeting  (cost=0.15..8.17 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=79)
                                                   Index Cond: (id = targeting.id)
                                                   Buffers: shared hit=81
               ->  Index Scan using "fki_advertising_peformance_advertising_entity_id -> advertising" on adobject_performance  (cost=0.42..89.78 rows=4081 width=32) (actual time=0.007..0.046 rows=140 loops=5998)
                     Index Cond: (ad_id = ad_product.ad_id)
                     Buffers: shared hit=36565
         ->  Index Scan using facebook_advertising_actions_pkey on facebook_adobject_actions  (cost=0.42..0.48 rows=1 width=12) (actual time=0.001..0.002 rows=1 loops=839720)
               Index Cond: (ad_performance.id = ad_performance_id)
               Buffers: shared hit=3363468
 Planning time: 1.525 ms
 Execution time: 3287.324 ms
(46 rows)

Answer 1

盲目地在这里拍摄，因为我们没有提供EXPLAIN的结果，但是，如果你在CTE中取出targeting表，Postgres应该更好地对待这个查询：

WITH targeting AS 
(
        SELECT ad_id, value from targeting, ad_targeting 
        WHERE targeting.id = ad_targeting.id and targeting.name = 'gender' 
)
SELECT ad_performance.ad_id, targeting.value AS targeting_value, 
     sum(impressions) AS impressions, 
     sum(app_starts) AS app_starts
FROM ad_performance
     LEFT JOIN ad on ad.id = ad_performance.ad_id
     LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
     RIGHT JOIN  targeting ON targeting.ad_id = ad.parent_id
WHERE ad_performance.ad_id IN 
       (SELECT ad_id FROM ad_product WHERE product = 'iphone')
GROUP BY ad_performance.ad_id, targeting_value

取自Documentation:

WITH查询的一个有用属性是它们只被评估一次每次执行父查询，即使它们被引用更多父查询或兄弟WITH查询不止一次。因此，昂贵多个地方所需的计算可以放在一个地方 WITH查询以避免冗余工作。另一个可能的应用是防止对副作用的功能进行不必要的多重评估。

Answer 2

我不知道这个查询是否会解决您的问题，但请尝试一下：

 SELECT ad_performance.ad_id, targeting.value AS targeting_value, 
     sum(impressions) AS impressions, 
     sum(app_starts) AS app_starts
 FROM ad_performance
     LEFT JOIN ad on ad.id = ad_performance.ad_id
     LEFT JOIN ad_actions ON ad_performance.id = ad_actions.ad_performance_id
     RIGHT JOIN ad_targeting on ad_targeting.ad_id = ad.parent_id
     INNER JOIN targeting on  targeting.id = ad_targeting.id and targeting.name = 'gender'   
     INNER JOIN ad_product on ad_product.ad_id = ad_performance.ad_id
WHERE ad_product.product = 'iphone'
GROUP BY ad_performance.ad_id, targeting_value

也许你会在你放入ON或WHERE条件的所有列上创建索引

Answer 3

执行计划似乎不再与查询匹配（也许您可以更新查询）。

然而，现在的问题是：

->  Hash Join  (cost=30.26..111.69 rows=9 width=222)
               (actual time=0.154..4.660 rows=5998 loops=1)
      Hash Cond: (adobject.parent_id = adobject_targeting.ad_id)
      Buffers: shared hit=151
      ->  Seq Scan on adobject  (cost=0.00..77.97 rows=897 width=8)
                                (actual time=0.009..1.449 rows=6001 loops=1)
            Buffers: shared hit=69
      ->  Hash  (cost=30.24..30.24 rows=2 width=222)
                (actual time=0.132..0.132 rows=2 loops=1)
            Buckets: 1024  Batches: 1  Memory Usage: 1kB
            Buffers: shared hit=82
            ->  Nested Loop  (cost=0.15..30.24 rows=2 width=222)
                             (actual time=0.101..0.129 rows=2 loops=1)
                  Buffers: shared hit=82
                  ->  Seq Scan on targeting  (cost=0.00..13.88 rows=2 width=222)
                                             (actual time=0.015..0.042 rows=79 loops=1)
                        Filter: (name = 'age group'::targeting_name)
                        Rows Removed by Filter: 82
                        Buffers: shared hit=1
                  ->  Index Scan using advertising_targeting_pkey on adobject_targeting
                                             (cost=0.15..8.17 rows=1 width=8)
                                             (actual time=0.001..0.001 rows=0 loops=79)
                        Index Cond: (id = targeting.id)
                        Buffers: shared hit=81

这是adobject与

结果之间的联接

targeting JOIN adobject_targeting
   USING (id)
WHERE targeting.name = 'age group'

后一个子查询被正确估计为2行，但是PostgreSQL没有注意到adobject中找到的几乎所有行都将匹配这两行中的一行，因此连接的结果将是6000而不是9它估计。

这会导致优化器稍后错误地选择嵌套循环连接，其中超过一半的查询时间用完。

不幸的是，由于PostgreSQL没有跨表统计信息，因此PostgreSQL无法更好地了解。

一个粗略的衡量标准是SET enable_nestloop=off，但这会降低其他（正确选择的）嵌套循环连接的性能，因此我不知道它是否会成为净赢。如果这有帮助，您可以考虑仅在查询期间更改参数（使用事务和SET LOCAL）。

也许有一种方法可以重写查询，以便找到更好的计划，但如果不知道确切的查询，这很难说。

改善SQL查询的运行时间

3 个答案: