以下查询很快:
SELECT a.id, b.id, c.id
FROM a
FULL OUTER JOIN b ON a.id = b.id
FULL OUTER JOIN c ON a.id = c.id
WHERE a.some > 5 AND a.thing < 10
然而,在多个表上执行where子句会导致我的数据集中的性能损失大约为1000万:
SELECT a.id, b.id, c.id
FROM a
FULL OUTER JOIN b ON a.id = b.id
FULL OUTER JOIN c ON a.id = c.id
WHERE (a.some > 5 AND a.thing < 10)
OR (b.some > 5 AND b.thing < 10)
OR (c.some > 5 AND c.thing < 10)
如何改进查询以提高效率?谢谢!
编辑:
这里是关于实际查询的sql解释(表名有点不同):
SELECT
ohh.hour
FROM org_hour_host ohh
FULL OUTER JOIN org_hour_timeseries ohs ON ohh.org_id = ohs.org_id
FULL OUTER JOIN org_hour_vs_host ohah ON ohh.org_id = ohah.org_id
WHERE (ohh.org_id IN (10) OR ohs.org_id IN (10) OR ohah.org_id IN (10))
XN Hash Full Join DS_DIST_OUTER (cost=6682944.40..234919986923528960.00 rows=1934276754413 width=8)
Outer Dist Key: "outer".org_id
Hash Cond: ("outer".org_id = "inner".org_id)
Filter: (("inner".org_id = 10) OR ("outer".org_id = 10) OR ("outer".org_id = 10))
-> XN Hash Full Join DS_DIST_NONE (cost=3050316.80..38694799792.93 rows=1934276754413 width=16)
Hash Cond: ("outer".org_id = "inner".org_id)
-> XN Seq Scan on org_hour_host ohh (cost=0.00..3130270.08 rows=313027008 width=12)
-> XN Hash (cost=2440253.44..2440253.44 rows=244025344 width=4)
-> XN Seq Scan on org_hour_timeseries ohs (cost=0.00..2440253.44 rows=244025344 width=4)
-> XN Hash (cost=2906102.08..2906102.08 rows=290610208 width=4)
-> XN Seq Scan on org_hour_vs_host ohah (cost=0.00..2906102.08 rows=290610208 width=4)
(11 rows)
SELECT
ohh.hour
FROM org_hour_host ohh
FULL OUTER JOIN org_hour_timeseries ohs ON ohh.org_id = ohs.org_id
FULL OUTER JOIN org_hour_vs_host ohah ON ohh.org_id = ohah.org_id
WHERE ohh.org_id IN (10)
XN Merge Left Join DS_DIST_NONE (cost=0.00..6350089909.81 rows=634262751009 width=8)
Merge Cond: ("outer".org_id = "inner".org_id)
-> XN Merge Left Join DS_DIST_NONE (cost=0.00..3667829.03 rows=64777233 width=12)
Merge Cond: ("outer".org_id = "inner".org_id)
-> XN Seq Scan on org_hour_host ohh (cost=0.00..131.03 rows=10483 width=12)
Filter: (org_id = 10)
-> XN Seq Scan on org_hour_timeseries ohs (cost=0.00..2440253.44 rows=244025344 width=4)
-> XN Seq Scan on org_hour_vs_host ohah (cost=0.00..2906102.08 rows=290610208 width=4)
(8 rows)
答案 0 :(得分:1)
在第一个查询中,子句a.some > 5 AND a.thing < 10
排除a.some
或a.thing
为NULL的行。这使得连接LEFT加入。
在第二个查询a.some
和a.thing
中,如果例如b.some > 5 AND b.thing < 10
为真,则可以为NULL。所以现在FULL JOIN实际上是FULL JOIN,可以提供更多行。最后,WHERE子句中的OR相对较慢。
理论上,您可以在加入之前首先应用条件,因此要过滤的行数较少,要加入的行数较少。未经测试,但这看起来像:
SELECT a.id, b.id, c.id
FROM (SELECT * FROM tbl_a
WHERE some > 5 AND thing < 10) a
FULL OUTER JOIN (SELECT * FROM tbl_b
WHERE some > 5 AND thing < 10) b ON a.id = b.id
FULL OUTER JOIN (SELECT * FROM tbl_c
WHERE some > 5 AND thing < 10) c ON a.id = c.id