我知道像这样的Hive SQL查询:
SELECT users, users > 0 AS have_user
FROM (
SELECT SUM(user) AS users
FROM sometable
GROUP BY something
);
将创建一个单独的map reduce作业,这很好。但是,我想避免代码中过多的子查询。例如:
SELECT SUM(user) AS user, SUM(user) > 0 AS have_user
FROM sometable
GROUP BY something;
在上面的代码中,Hive将一次或两次计算此SUM聚合吗?
答案 0 :(得分:1)
Hive将不会在map / reduce阶段执行2个不同的工作,也不会两次计算聚合,只会执行一次。您可以看一下这样的执行计划
explain
SELECT users, users > 0 AS have_user
FROM (
SELECT SUM(user) AS users
FROM sometable
GROUP BY something
);
您应该只能看到1个这样的汇总
Group By Operator
aggregations: sum(VALUE._col0)
它将针对您选择的条件重复使用聚合结果
Select Operator
expressions: _col1 (type: bigint), (_col1 > 0) (type: boolean)
outputColumnNames: _col0, _col1
答案 1 :(得分:0)
我不知道hive
将如何解释该查询,但我会使用HAVING
子句更正您的查询:
这是Query的正确版本:
SELECT something, SUM(user) AS have_user,
FROM table
GROUP BY something
HVAING SUM(user) > 0;