SQL JOIN与多个表上的WHERE子句减慢

时间:2017-12-01 19:00:54

标签: sql postgresql join

以下查询很快:

SELECT a.id, b.id, c.id
FROM a
FULL OUTER JOIN b ON a.id = b.id
FULL OUTER JOIN c ON a.id = c.id
WHERE a.some > 5 AND a.thing < 10

然而,在多个表上执行where子句会导致我的数据集中的性能损失大约为1000万:

SELECT a.id, b.id, c.id
FROM a
FULL OUTER JOIN b ON a.id = b.id
FULL OUTER JOIN c ON a.id = c.id
WHERE (a.some > 5 AND a.thing < 10)
    OR (b.some > 5 AND b.thing < 10)
    OR (c.some > 5 AND c.thing < 10)

如何改进查询以提高效率?谢谢!

编辑:

这里是关于实际查询的sql解释(表名有点不同):

SELECT
    ohh.hour
FROM org_hour_host ohh
FULL OUTER JOIN org_hour_timeseries ohs ON ohh.org_id = ohs.org_id
FULL OUTER JOIN org_hour_vs_host ohah ON ohh.org_id = ohah.org_id
WHERE (ohh.org_id IN (10) OR ohs.org_id IN (10) OR ohah.org_id IN (10))

XN Hash Full Join DS_DIST_OUTER  (cost=6682944.40..234919986923528960.00 rows=1934276754413 width=8)
  Outer Dist Key: "outer".org_id
  Hash Cond: ("outer".org_id = "inner".org_id)
  Filter: (("inner".org_id = 10) OR ("outer".org_id = 10) OR ("outer".org_id = 10))
  ->  XN Hash Full Join DS_DIST_NONE  (cost=3050316.80..38694799792.93 rows=1934276754413 width=16)
        Hash Cond: ("outer".org_id = "inner".org_id)
        ->  XN Seq Scan on org_hour_host ohh  (cost=0.00..3130270.08 rows=313027008 width=12)
        ->  XN Hash  (cost=2440253.44..2440253.44 rows=244025344 width=4)
              ->  XN Seq Scan on org_hour_timeseries ohs  (cost=0.00..2440253.44 rows=244025344 width=4)
  ->  XN Hash  (cost=2906102.08..2906102.08 rows=290610208 width=4)
        ->  XN Seq Scan on org_hour_vs_host ohah  (cost=0.00..2906102.08 rows=290610208 width=4)
(11 rows)





SELECT
    ohh.hour
FROM org_hour_host ohh
FULL OUTER JOIN org_hour_timeseries ohs ON ohh.org_id = ohs.org_id
FULL OUTER JOIN org_hour_vs_host ohah ON ohh.org_id = ohah.org_id
WHERE ohh.org_id IN (10)

XN Merge Left Join DS_DIST_NONE  (cost=0.00..6350089909.81 rows=634262751009 width=8)
  Merge Cond: ("outer".org_id = "inner".org_id)
  ->  XN Merge Left Join DS_DIST_NONE  (cost=0.00..3667829.03 rows=64777233 width=12)
        Merge Cond: ("outer".org_id = "inner".org_id)
        ->  XN Seq Scan on org_hour_host ohh  (cost=0.00..131.03 rows=10483 width=12)
              Filter: (org_id = 10)
        ->  XN Seq Scan on org_hour_timeseries ohs  (cost=0.00..2440253.44 rows=244025344 width=4)
  ->  XN Seq Scan on org_hour_vs_host ohah  (cost=0.00..2906102.08 rows=290610208 width=4)
(8 rows)

1 个答案:

答案 0 :(得分:1)

在第一个查询中,子句a.some > 5 AND a.thing < 10排除a.somea.thing为NULL的行。这使得连接LEFT加入。 在第二个查询a.somea.thing中,如果例如b.some > 5 AND b.thing < 10为真,则可以为NULL。所以现在FULL JOIN实际上是FULL JOIN,可以提供更多行。最后,WHERE子句中的OR相对较慢。

理论上,您可以在加入之前首先应用条件,因此要过滤的行数较少,要加入的行数较少。未经测试,但这看起来像:

SELECT a.id, b.id, c.id
FROM (SELECT * FROM tbl_a 
      WHERE some > 5 AND thing < 10) a
FULL OUTER JOIN (SELECT * FROM tbl_b 
      WHERE some > 5 AND thing < 10) b ON a.id = b.id
FULL OUTER JOIN (SELECT * FROM tbl_c 
      WHERE some > 5 AND thing < 10) c ON a.id = c.id