查询需要很长时间才能运行。有什么方法可以简化它吗?

时间:2013-10-04 22:39:06

标签: sql hive hue

我正在运行如下代码。我遇到了很长时间的麻烦。有没有办法让它跑得更快?

SELECT
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
, sum(b.column1) as alien

FROM table1 a

INNER JOIN table1 b

ON a.data_date = b.data_date AND a.column3 = b.column3

WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
and b.column4 is NULL

GROUP BY
a.data_date

3 个答案:

答案 0 :(得分:1)

据我所知,你根本不需要JOIN 您可以通过对表的单一引用获得相同的结果。

答案 1 :(得分:0)

由于这是同一张表,我相信你可以删除你的联接,最好的是提供你的样本数据和预期结果,然后我们可以帮助你更好,欢呼=)

SELECT
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
--remove this
--, sum(b.column1) as alien

FROM table1 a

--remove this
--INNER JOIN table1 b

--ON a.data_date = b.data_date AND a.column3 = b.column3

WHERE a.data_date ='20131001'
and a.column3 = 12345


and a.column4 is not NULL
--remove this
--and b.column4 is NULL

GROUP BY
a.data_date,a.column3

答案 2 :(得分:0)

优化技术还取决于表的大小。

小表应该是第一个,并尝试将该表放在分布式缓存上。

为了加快速度,而不是在加入后应用条件,尝试在加入之前应用它,以便加快连接速度。

您可以尝试下面的内容

set hive.auto.convert.join.true;
select
a.data_date as day
, sum(a.column1) + sum(a.column2) as total
, sum(a.column1) as part1
, sum(a.column2) as part2
, sum(b.column1) as alien
from table1 b
inner join (select * from table1 WHERE a.data_date ='20131001'
and a.column3 = 12345
and a.column4 is not NULL
)a
on (a.data_date = b.data_date AND a.column3 = b.column3)

where b.column4 is NULL
GROUP BY
a.data_date