Redshift加入VS.联盟与集团

时间:2017-01-03 17:21:54

标签: sql performance join union amazon-redshift

我想说我想从2个表中拉出字段dim,a,b,c,d,其中一个包含a,b,另一个包含c,d。

我想知道是否有一种首选方式(在以下方面之间) - 表现明智

1

select t1.dim,a,b,c,d
from 
(select dim,sum(a) as a,sum(b)as b from t1 group by dim)t1 
join 
(select dim,sum(c) as c,sum(d) as d from t2 group by dim)t2 
on t1.dim=t2.dim;

2:

select dim,sum(a) as a,sum(b) as b,sum(c) as c,sum(d) as d
from 
(
select dim,a,b,null as c, null as d from t1
union
select dim,null as a, null as b, c, d from t2
)a
group by dim

当然,在处理大量数据时(最终查询中有5-30M条记录)。

谢谢!

1 个答案:

答案 0 :(得分:0)

第一个方法过滤器将是两个表中都没有的任何dim值。 union效率低下。所以,两者都没有吸引力。

我会选择:

select dim, sum(a) as a, sum(b) as b, sum(c) as c, sum(d) as d
from (select dim, a, b, null as c, null as d from t1
      union all
      select dim, null as a, null as b, c, d from t2
     ) a
group by dim;

您还可以预先聚合每个子查询中的值。或者使用full outer join作为第一种方法。