由于在一个查询中使用了count和distinct,因此在配置单元中仅使用一个reduce存在问题。 如何重写选择以消除这种情况?窗口功能有可能吗?
select
a.second_id,
if(a.proc_id = 'CONST1' and bb.third_id is not null,
count(distinct bb.first_id),
'') as qty
from a a
join (select
b.first_id,
b.second_id,
b.third_id
from b b) bb
on bb.second_id = a.second_id
group by
a.second_id,
a.proc_id,
bb.third_id;
答案 0 :(得分:1)
这是您的查询:
select a.second_id,
(case when a.proc_id = 'CONST1' and bb.third_id is not null
then count(distinct bb.first_id)
end) as qty
from a join
(select b.first_id, b.second_id, b.third_id
from b
) bb
on bb.second_id = a.second_id
group by a.second_id, a.proc_id, bb.third_id;
实际上,count(distinct)
可以使用group by
和窗口函数在子查询中进行处理。我看不到不首先聚合的任何价值,所以:
select a.second_id,
(case when a.proc_id = 'CONST1' and bb.third_id is not null
then max(bb.num_firsts)
end) as qty
from a join
(select b.second_id, b.third_id,
count(distinct first_id) as num_firsts
from b
group by b.second_id, b.third_id
) bb
on bb.second_id = a.second_id
group by a.second_id, a.proc_id, bb.third_id;
您正在外部查询中按second_id
和third_id
进行汇总。因此,外部查询中的聚合子查询只有一行。上面的版本使用max(first_id)
,但是您也可以在外部num_firsts
中加入group by
。
那仍然可能无法解决您的问题,但是此查询更易于修改。我记得,Hive中最好的方法是select distinct
子查询:
select a.second_id,
(case when a.proc_id = 'CONST1' and bb.third_id is not null
then max(bb.num_firsts)
end) as qty
from a join
(select b.second_id, b.third_id,
count(*) as num_firsts
from (select distinct second_id, third_id, first_id
from b
) b
group by b.second_id, b.third_id
) bb
on bb.second_id = a.second_id
group by a.second_id, a.proc_id, bb.third_id;
如果first_id
从未为null
,这是同一件事。这将被视为一个单独的值;如果您不想,只需将它们过滤掉即可。