我正在运行如下代码。我遇到了很长时间的麻烦。有没有办法让它跑得更快?
SELECT
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
, sum(b.column1) as alien
FROM table1 a
INNER JOIN table1 b
ON a.data_date = b.data_date AND a.column3 = b.column3
WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
and b.column4 is NULL
GROUP BY
a.data_date
答案 0 :(得分:1)
据我所知,你根本不需要JOIN
您可以通过对表的单一引用获得相同的结果。
答案 1 :(得分:0)
由于这是同一张表,我相信你可以删除你的联接,最好的是提供你的样本数据和预期结果,然后我们可以帮助你更好,欢呼=)
SELECT
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
--remove this
--, sum(b.column1) as alien
FROM table1 a
--remove this
--INNER JOIN table1 b
--ON a.data_date = b.data_date AND a.column3 = b.column3
WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
--remove this
--and b.column4 is NULL
GROUP BY
a.data_date,a.column3
答案 2 :(得分:0)
优化技术还取决于表的大小。
小表应该是第一个,并尝试将该表放在分布式缓存上。
为了加快速度,而不是在加入后应用条件,尝试在加入之前应用它,以便加快连接速度。
您可以尝试下面的内容
set hive.auto.convert.join.true;
select
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
, sum(b.column1) as alien
from table1 b
inner join (select * from table1 WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
)a
on (a.data_date = b.data_date AND a.column3 = b.column3)
where b.column4 is NULL
GROUP BY
a.data_date