假设我有一个描述工人对项目的贡献的表
project worker contribution
1 1 2
1 2 3
2 1 4
为了计算工人的影响,我可以
select t.project, t.worker, t.contribution, p.total,
t.contribution / p.total as relative
from my_table t
join (select project, sum(contribution) as total
from my_table group by project) p
on t.project = p.project
获取new_table
project worker contribution total relative
1 1 1 5 .2
1 2 4 5 .8
2 1 4 4 1
如果我现在使用
计算平均相对贡献select worker, avg(relative) as avg_rel
from new_table group by worker
我会看到
worker avg_rel
1 .6
2 .8
忽略了工作人员2
对项目2
的贡献。
如何将其考虑在内? 即,我想得到
worker avg_rel
1 .6
2 .4
好像原始表还包含行
2 2 0
感谢。
答案 0 :(得分:1)
你不能像这样使用左外连接吗?也许加入一份工人名单?
select t.project, t.worker, t.contribution, p.total,
case when coalesce(p.total, 0) = 0 then 0 else t.contribution / p.total end as relative
from my_table t
full
outer
join ( select distinct worker
from my_table
) w
on t.worker = w.worker
join (select project, sum(contribution) as total
from my_table group by project) p
on t.project = p.project
答案 1 :(得分:1)
我不确定这在Hive中是否有效,但这是一个SQL解决方案:
select w.worker, avg(coalesce(t.relative, 0.0)) as avg_rel
from (select distinct project from my_table) p cross join
(select distinct worker from my_table) w left outer join
my_table t
on t.project = p.project and t.worker = w.worker
group by w.worker;