如何从postgresql 10.3中的多连接查询中删除嵌套循环

时间:2018-03-14 03:54:28

标签: sql postgresql query-optimization postgresql-performance

我有一个名为sources的表,其中包含价格,我有另一个名为destination的表,其中包含另一组值。我需要获取每个源的所有目标,因此执行交叉连接,将来自sources表的每个值与目标表中的每个值相乘 source_id和destination_id是主键,我想内部连接这个结果表与另一个表,当前给我一个嵌套循环

APPROACH 1

//has a nested loop
EXPLAIN SELECT * FROM 
(select concat(s.source_id, ':', d.destination_id) AS pair_id, 
(s.price * d.price) AS pair_price 
FROM e1_sources s 
CROSS JOIN e1_destinations d) AS p
INNER JOIN e1_alerts a
ON a.pair=p.pair_id
WHERE 
(p.pair_price > a.value AND a.direction=true) OR
(p.pair_price <= a.value AND a.direction=false)

APPROACH 2

//has a nested loop
EXPLAIN WITH pairs AS 
(
    SELECT 
    concat(s.source_id, ':', d.destination_id) AS pair_id,
    (s.price * d.price) AS pair_price
    FROM e1_sources s
    CROSS JOIN e1_destinations d
)
SELECT * from pairs p
INNER JOIN e1_alerts a
ON p.pair_id=a.pair
WHERE 
(p.pair_price > a.value AND a.direction=true) OR 
(p.pair_price <= a.value AND a.direction=false)

APPROACH 1 ANALYZE

"Hash Join  (cost=3697.72..210978.26 rows=1297875 width=114)"
"  Hash Cond: (concat(s.source_id, ':', d.destination_id) = (a.pair)::text)"
"  Join Filter: ((((s.price * d.price) > a.value) AND a.direction) OR (((s.price * d.price) <= a.value) AND (NOT a.direction)))"
"  ->  Nested Loop  (cost=0.00..19303.43 rows=1540440 width=70)"
"        ->  Seq Scan on e1_sources s  (cost=0.00..25.56 rows=1556 width=16)"
"        ->  Materialize  (cost=0.00..24.85 rows=990 width=54)"
"              ->  Seq Scan on e1_destinations d  (cost=0.00..19.90 rows=990 width=54)"
"  ->  Hash  (cost=2025.00..2025.00 rows=75098 width=50)"
"        ->  Seq Scan on e1_alerts a  (cost=0.00..2025.00 rows=75098 width=50)"
"              Filter: (direction OR (NOT direction))"

ARPPOACH 2 ANALYZE

"Hash Join  (cost=56349.38..649740.92 rows=7089424 width=114)"
"  Hash Cond: (p.pair_id = (a.pair)::text)"
"  Join Filter: (((p.pair_price > a.value) AND a.direction) OR ((p.pair_price <= a.value) AND (NOT a.direction)))"
"  CTE pairs"
"    ->  Nested Loop  (cost=0.00..19378.74 rows=1104760 width=64)"
"          ->  Seq Scan on e1_sources s  (cost=0.00..26.56 rows=1556 width=16)"
"          ->  Materialize  (cost=0.00..20.65 rows=710 width=54)"
"                ->  Seq Scan on e1_destinations d  (cost=0.00..17.10 rows=710 width=54)"
"  ->  CTE Scan on pairs p  (cost=0.00..22095.20 rows=1104760 width=64)"
"  ->  Hash  (cost=20248.06..20248.06 rows=751007 width=50)"
"        ->  Seq Scan on e1_alerts a  (cost=0.00..20248.06 rows=751007 width=50)"
"              Filter: (direction OR (NOT direction))"

但是,如果我有一个单独的表包含交叉连接产品作为pair_id然后如果我做了一个内连接,我在分析中得到一个哈希扫描,查询几乎不需要几毫秒

APPROACH 3 我有一个称为对的物化视图,它包含源和目标的交叉连接,其连接的pair_id作为主键 现在内连接仅需几秒钟,因为它不执行嵌套循环

EXPLAIN ANALYZE 
SELECT * from pairs p 
INNER JOIN e1_alerts a
ON p.pair_id = a.pair 
WHERE 
(p.pair_price > a.value AND a.direction=true) OR
(p.pair_price <= a.value AND a.direction=false)

分析方法3

"Hash Join  (cost=1459.32..4892.41 rows=30566 width=73) (actual time=14.048..92.158 rows=498 loops=1)"
"  Hash Cond: ((a.pair)::text = p.pair_id)"
"  Join Filter: (((p.pair_price > a.value) AND a.direction) OR ((p.pair_price <= a.value) AND (NOT a.direction)))"
"  Rows Removed by Join Filter: 99502"
"  ->  Seq Scan on e1_alerts a  (cost=0.00..2025.00 rows=75098 width=50) (actual time=0.010..16.658 rows=100000 loops=1)"
"        Filter: (direction OR (NOT direction))"
"  ->  Hash  (cost=836.92..836.92 rows=49792 width=23) (actual time=13.736..13.736 rows=49792 loops=1)"
"        Buckets: 65536  Batches: 1  Memory Usage: 3245kB"
"        ->  Seq Scan on pairs p  (cost=0.00..836.92 rows=49792 width=23) (actual time=0.005..5.029 rows=49792 loops=1)"
"Planning time: 0.494 ms"
"Execution time: 92.262 ms"

几个问题

  1. 方法1和2是否进行嵌套连接,因为他们不知道pair_id是否是主键,有什么方法可以告诉postgresql由交叉连接产生的特定列是唯一的吗?
  2. 除了使用物化视图外,还有其他方法吗?我的源x目标表将包含最坏的80000值,需要每x分钟更新一次,我不想发送这么多的数据库更新。如果我只发送大约2000个值的源和目的地,我将能够从交叉连接生成对表

1 个答案:

答案 0 :(得分:0)

好吧,我找到了一个解决方案,比我上面尝试的任何东西都快100倍,但我不知道为什么。当我在方法1和方法2中的两列之间进行交叉连接时,我在两个表之间没有任何公共列。为了将这个交叉连接转换为内部连接,我只在两个表中添加了相同的列,并使用相同的重复数据,并使用此列作为执行INNER JOIN的借口,但现在结果在性能方面有很大差异! !!

APPROACH 4

explain analyze SELECT * 
FROM 
(select concat(s.source_id, ':', d.destination_id) as pair_id, 
(s.price * d.price) as pair_price 
FROM e1_sources s 
INNER JOIN e1_destinations d 
ON s.destination_id=d.source_id) as p
INNER JOIN e1_alerts a
ON a.pair=p.pair_id
WHERE 
(p.pair_price > a.value AND a.direction=true) OR
(p.pair_price <= a.value AND a.direction=false)

这是一种欺骗查询优化器相信它正在进行内连接的方法吗?以内连接为借口加入相同数量的行已完全消除了NESTED LOOP!如果有人能说清楚为什么会发生这种情况,我将不胜感激

分析方法4

"Hash Join  (cost=456.66..712.93 rows=1862 width=114) (actual time=4.702..67.509 rows=51 loops=1)"
"  Hash Cond: (concat(s.source_id, ':', d.destination_id) = (a.pair)::text)"
"  Join Filter: ((((s.price * d.price) > a.value) AND a.direction) OR (((s.price * d.price) <= a.value) AND (NOT a.direction)))"
"  Rows Removed by Join Filter: 9949"
"  ->  Merge Join  (cost=159.78..246.19 rows=5524 width=70) (actual time=0.630..13.783 rows=49792 loops=1)"
"        Merge Cond: ((d.source_id)::text = (s.destination_id)::text)"
"        ->  Sort  (cost=50.72..52.50 rows=710 width=86) (actual time=0.042..0.049 rows=32 loops=1)"
"              Sort Key: d.source_id"
"              Sort Method: quicksort  Memory: 27kB"
"              ->  Seq Scan on e1_destinations d  (cost=0.00..17.10 rows=710 width=86) (actual time=0.020..0.025 rows=32 loops=1)"
"        ->  Sort  (cost=109.06..112.95 rows=1556 width=20) (actual time=0.583..4.144 rows=49761 loops=1)"
"              Sort Key: s.destination_id"
"              Sort Method: quicksort  Memory: 167kB"
"              ->  Seq Scan on e1_sources s  (cost=0.00..26.56 rows=1556 width=20) (actual time=0.010..0.268 rows=1556 loops=1)"
"  ->  Hash  (cost=203.00..203.00 rows=7510 width=50) (actual time=3.507..3.507 rows=10000 loops=1)"
"        Buckets: 16384 (originally 8192)  Batches: 1 (originally 1)  Memory Usage: 949kB"
"        ->  Seq Scan on e1_alerts a  (cost=0.00..203.00 rows=7510 width=50) (actual time=0.013..1.771 rows=10000 loops=1)"
"              Filter: (direction OR (NOT direction))"
"Planning time: 0.251 ms"
"Execution time: 67.590 ms"