我有一个名为sources的表,其中包含价格,我有另一个名为destination的表,其中包含另一组值。我需要获取每个源的所有目标,因此执行交叉连接,将来自sources表的每个值与目标表中的每个值相乘 source_id和destination_id是主键,我想内部连接这个结果表与另一个表,当前给我一个嵌套循环
APPROACH 1
//has a nested loop
EXPLAIN SELECT * FROM
(select concat(s.source_id, ':', d.destination_id) AS pair_id,
(s.price * d.price) AS pair_price
FROM e1_sources s
CROSS JOIN e1_destinations d) AS p
INNER JOIN e1_alerts a
ON a.pair=p.pair_id
WHERE
(p.pair_price > a.value AND a.direction=true) OR
(p.pair_price <= a.value AND a.direction=false)
APPROACH 2
//has a nested loop
EXPLAIN WITH pairs AS
(
SELECT
concat(s.source_id, ':', d.destination_id) AS pair_id,
(s.price * d.price) AS pair_price
FROM e1_sources s
CROSS JOIN e1_destinations d
)
SELECT * from pairs p
INNER JOIN e1_alerts a
ON p.pair_id=a.pair
WHERE
(p.pair_price > a.value AND a.direction=true) OR
(p.pair_price <= a.value AND a.direction=false)
APPROACH 1 ANALYZE
"Hash Join (cost=3697.72..210978.26 rows=1297875 width=114)"
" Hash Cond: (concat(s.source_id, ':', d.destination_id) = (a.pair)::text)"
" Join Filter: ((((s.price * d.price) > a.value) AND a.direction) OR (((s.price * d.price) <= a.value) AND (NOT a.direction)))"
" -> Nested Loop (cost=0.00..19303.43 rows=1540440 width=70)"
" -> Seq Scan on e1_sources s (cost=0.00..25.56 rows=1556 width=16)"
" -> Materialize (cost=0.00..24.85 rows=990 width=54)"
" -> Seq Scan on e1_destinations d (cost=0.00..19.90 rows=990 width=54)"
" -> Hash (cost=2025.00..2025.00 rows=75098 width=50)"
" -> Seq Scan on e1_alerts a (cost=0.00..2025.00 rows=75098 width=50)"
" Filter: (direction OR (NOT direction))"
ARPPOACH 2 ANALYZE
"Hash Join (cost=56349.38..649740.92 rows=7089424 width=114)"
" Hash Cond: (p.pair_id = (a.pair)::text)"
" Join Filter: (((p.pair_price > a.value) AND a.direction) OR ((p.pair_price <= a.value) AND (NOT a.direction)))"
" CTE pairs"
" -> Nested Loop (cost=0.00..19378.74 rows=1104760 width=64)"
" -> Seq Scan on e1_sources s (cost=0.00..26.56 rows=1556 width=16)"
" -> Materialize (cost=0.00..20.65 rows=710 width=54)"
" -> Seq Scan on e1_destinations d (cost=0.00..17.10 rows=710 width=54)"
" -> CTE Scan on pairs p (cost=0.00..22095.20 rows=1104760 width=64)"
" -> Hash (cost=20248.06..20248.06 rows=751007 width=50)"
" -> Seq Scan on e1_alerts a (cost=0.00..20248.06 rows=751007 width=50)"
" Filter: (direction OR (NOT direction))"
但是,如果我有一个单独的表包含交叉连接产品作为pair_id然后如果我做了一个内连接,我在分析中得到一个哈希扫描,查询几乎不需要几毫秒
APPROACH 3 我有一个称为对的物化视图,它包含源和目标的交叉连接,其连接的pair_id作为主键 现在内连接仅需几秒钟,因为它不执行嵌套循环
EXPLAIN ANALYZE
SELECT * from pairs p
INNER JOIN e1_alerts a
ON p.pair_id = a.pair
WHERE
(p.pair_price > a.value AND a.direction=true) OR
(p.pair_price <= a.value AND a.direction=false)
分析方法3
"Hash Join (cost=1459.32..4892.41 rows=30566 width=73) (actual time=14.048..92.158 rows=498 loops=1)"
" Hash Cond: ((a.pair)::text = p.pair_id)"
" Join Filter: (((p.pair_price > a.value) AND a.direction) OR ((p.pair_price <= a.value) AND (NOT a.direction)))"
" Rows Removed by Join Filter: 99502"
" -> Seq Scan on e1_alerts a (cost=0.00..2025.00 rows=75098 width=50) (actual time=0.010..16.658 rows=100000 loops=1)"
" Filter: (direction OR (NOT direction))"
" -> Hash (cost=836.92..836.92 rows=49792 width=23) (actual time=13.736..13.736 rows=49792 loops=1)"
" Buckets: 65536 Batches: 1 Memory Usage: 3245kB"
" -> Seq Scan on pairs p (cost=0.00..836.92 rows=49792 width=23) (actual time=0.005..5.029 rows=49792 loops=1)"
"Planning time: 0.494 ms"
"Execution time: 92.262 ms"
几个问题
答案 0 :(得分:0)
好吧,我找到了一个解决方案,比我上面尝试的任何东西都快100倍,但我不知道为什么。当我在方法1和方法2中的两列之间进行交叉连接时,我在两个表之间没有任何公共列。为了将这个交叉连接转换为内部连接,我只在两个表中添加了相同的列,并使用相同的重复数据,并使用此列作为执行INNER JOIN的借口,但现在结果在性能方面有很大差异! !!
APPROACH 4
explain analyze SELECT *
FROM
(select concat(s.source_id, ':', d.destination_id) as pair_id,
(s.price * d.price) as pair_price
FROM e1_sources s
INNER JOIN e1_destinations d
ON s.destination_id=d.source_id) as p
INNER JOIN e1_alerts a
ON a.pair=p.pair_id
WHERE
(p.pair_price > a.value AND a.direction=true) OR
(p.pair_price <= a.value AND a.direction=false)
这是一种欺骗查询优化器相信它正在进行内连接的方法吗?以内连接为借口加入相同数量的行已完全消除了NESTED LOOP!如果有人能说清楚为什么会发生这种情况,我将不胜感激
分析方法4
"Hash Join (cost=456.66..712.93 rows=1862 width=114) (actual time=4.702..67.509 rows=51 loops=1)"
" Hash Cond: (concat(s.source_id, ':', d.destination_id) = (a.pair)::text)"
" Join Filter: ((((s.price * d.price) > a.value) AND a.direction) OR (((s.price * d.price) <= a.value) AND (NOT a.direction)))"
" Rows Removed by Join Filter: 9949"
" -> Merge Join (cost=159.78..246.19 rows=5524 width=70) (actual time=0.630..13.783 rows=49792 loops=1)"
" Merge Cond: ((d.source_id)::text = (s.destination_id)::text)"
" -> Sort (cost=50.72..52.50 rows=710 width=86) (actual time=0.042..0.049 rows=32 loops=1)"
" Sort Key: d.source_id"
" Sort Method: quicksort Memory: 27kB"
" -> Seq Scan on e1_destinations d (cost=0.00..17.10 rows=710 width=86) (actual time=0.020..0.025 rows=32 loops=1)"
" -> Sort (cost=109.06..112.95 rows=1556 width=20) (actual time=0.583..4.144 rows=49761 loops=1)"
" Sort Key: s.destination_id"
" Sort Method: quicksort Memory: 167kB"
" -> Seq Scan on e1_sources s (cost=0.00..26.56 rows=1556 width=20) (actual time=0.010..0.268 rows=1556 loops=1)"
" -> Hash (cost=203.00..203.00 rows=7510 width=50) (actual time=3.507..3.507 rows=10000 loops=1)"
" Buckets: 16384 (originally 8192) Batches: 1 (originally 1) Memory Usage: 949kB"
" -> Seq Scan on e1_alerts a (cost=0.00..203.00 rows=7510 width=50) (actual time=0.013..1.771 rows=10000 loops=1)"
" Filter: (direction OR (NOT direction))"
"Planning time: 0.251 ms"
"Execution time: 67.590 ms"