事实:
现在这里是解释的查询:
WITH RECURSIVE
rows AS
(
SELECT *
FROM (
SELECT r.id, r.set, r.parent, r.masterid
FROM d_storage.node_dataset r
WHERE masterid = 3533933
) q
UNION ALL
SELECT *
FROM (
SELECT c.id, c.set, c.parent, r.masterid
FROM rows r
JOIN a_storage.node c
ON c.parent = r.id
) q
)
SELECT r.masterid, r.id AS nodeid
FROM rows r
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on rows r (cost=2742105.92..2862119.94 rows=6000701 width=16) (actual time=0.033..172111.204 rows=4 loops=1)
CTE rows
-> Recursive Union (cost=0.00..2742105.92 rows=6000701 width=28) (actual time=0.029..172111.183 rows=4 loops=1)
-> Index Scan using node_dataset_masterid on node_dataset r (cost=0.00..8.60 rows=1 width=28) (actual time=0.025..0.027 rows=1 loops=1)
Index Cond: (masterid = 3533933)
-> Hash Join (cost=0.33..262208.33 rows=600070 width=28) (actual time=40628.371..57370.361 rows=1 loops=3)
Hash Cond: (c.parent = r.id)
-> Append (cost=0.00..211202.04 rows=12001404 width=20) (actual time=0.011..46365.669 rows=12000004 loops=3)
-> Seq Scan on node c (cost=0.00..24.00 rows=1400 width=20) (actual time=0.002..0.002 rows=0 loops=3)
-> Seq Scan on node_dataset c (cost=0.00..55001.01 rows=3000001 width=20) (actual time=0.007..3426.593 rows=3000001 loops=3)
-> Seq Scan on node_stammdaten c (cost=0.00..52059.01 rows=3000001 width=20) (actual time=0.008..9049.189 rows=3000001 loops=3)
-> Seq Scan on node_stammdaten_adresse c (cost=0.00..52059.01 rows=3000001 width=20) (actual time=3.455..8381.725 rows=3000001 loops=3)
-> Seq Scan on node_testdaten c (cost=0.00..52059.01 rows=3000001 width=20) (actual time=1.810..5259.178 rows=3000001 loops=3)
-> Hash (cost=0.20..0.20 rows=10 width=16) (actual time=0.010..0.010 rows=1 loops=3)
-> WorkTable Scan on rows r (cost=0.00..0.20 rows=10 width=16) (actual time=0.002..0.004 rows=1 loops=3)
Total runtime: 172111.371 ms
(16 rows)
(END)
到目前为止,计划程序决定选择散列连接(好)但没有索引(坏)。
现在执行以下操作后:
SET enable_hashjoins TO false;
解释的查询如下:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on rows r (cost=15198247.00..15318261.02 rows=6000701 width=16) (actual time=0.038..49.221 rows=4 loops=1)
CTE rows
-> Recursive Union (cost=0.00..15198247.00 rows=6000701 width=28) (actual time=0.032..49.201 rows=4 loops=1)
-> Index Scan using node_dataset_masterid on node_dataset r (cost=0.00..8.60 rows=1 width=28) (actual time=0.028..0.031 rows=1 loops=1)
Index Cond: (masterid = 3533933)
-> Nested Loop (cost=0.00..1507822.44 rows=600070 width=28) (actual time=10.384..16.382 rows=1 loops=3)
Join Filter: (r.id = c.parent)
-> WorkTable Scan on rows r (cost=0.00..0.20 rows=10 width=16) (actual time=0.001..0.003 rows=1 loops=3)
-> Append (cost=0.00..113264.67 rows=3001404 width=20) (actual time=8.546..12.268 rows=1 loops=4)
-> Seq Scan on node c (cost=0.00..24.00 rows=1400 width=20) (actual time=0.001..0.001 rows=0 loops=4)
-> Bitmap Heap Scan on node_dataset c (cost=58213.87..113214.88 rows=3000001 width=20) (actual time=1.906..1.906 rows=0 loops=4)
Recheck Cond: (c.parent = r.id)
-> Bitmap Index Scan on node_dataset_parent (cost=0.00..57463.87 rows=3000001 width=0) (actual time=1.903..1.903 rows=0 loops=4)
Index Cond: (c.parent = r.id)
-> Index Scan using node_stammdaten_parent on node_stammdaten c (cost=0.00..8.60 rows=1 width=20) (actual time=3.272..3.273 rows=0 loops=4)
Index Cond: (c.parent = r.id)
-> Index Scan using node_stammdaten_adresse_parent on node_stammdaten_adresse c (cost=0.00..8.60 rows=1 width=20) (actual time=4.333..4.333 rows=0 loops=4)
Index Cond: (c.parent = r.id)
-> Index Scan using node_testdaten_parent on node_testdaten c (cost=0.00..8.60 rows=1 width=20) (actual time=2.745..2.746 rows=0 loops=4)
Index Cond: (c.parent = r.id)
Total runtime: 49.349 ms
(21 rows)
(END)
- >非常快,因为使用了索引。 注意:第二个查询的成本略高于第一个查询的成本。
所以主要的问题是:为什么规划者做出第一个决定,而不是第二个?
还有意思:
经
SET enable_seqscan为false;
我很温暖。禁用seq扫描。比规划师使用索引和散列连接,查询仍然很慢。所以问题似乎是哈希联接。也许有人可以在这种令人困惑的情况下提供帮助?
thx,R。
答案 0 :(得分:1)
如果您的解释与现实存在显着差异(费用较低但时间较长),则您的统计信息可能已过期,或者是非代表性样本。
再次尝试新的统计数据。如果这没有用,请再次增加样本量并构建统计数据。
答案 1 :(得分:0)
试试这个:
设置seq_page_cost ='4.0'; set random_page_cost ='1.0; 解释分析......
在您的会话中执行此操作,看看它是否有所作为。
答案 2 :(得分:0)
散列连接期望产生600070行,但只有4个(在3个循环中,每个循环平均1行)。如果600070准确,那么散列连接可能是合适的。