将缺失的行视为具有0个数据

时间:2014-01-23 20:45:35

标签: sql hive

假设我有一个描述工人对项目的贡献的表

project worker contribution
1       1      2
1       2      3
2       1      4

为了计算工人的影响,我可以

select t.project, t.worker, t.contribution, p.total, 
       t.contribution / p.total as relative 
from my_table t
join (select project, sum(contribution) as total
      from my_table group by project) p
on t.project = p.project

获取new_table

project worker contribution total relative
1       1      1            5     .2
1       2      4            5     .8
2       1      4            4     1

如果我现在使用

计算平均相对贡献
select worker, avg(relative) as avg_rel
from new_table group by worker

我会看到

worker avg_rel
1      .6
2      .8

忽略了工作人员2对项目2的贡献。

如何将其考虑在内? 即,我想得到

worker avg_rel
1      .6
2      .4

好像原始表还包含行

2       2      0

感谢。

2 个答案:

答案 0 :(得分:1)

你不能像这样使用左外连接吗?也许加入一份工人名单?

select t.project, t.worker, t.contribution, p.total, 
   case when coalesce(p.total, 0) = 0 then 0 else t.contribution / p.total end as relative 
from my_table t
full
outer
join ( select distinct worker
         from my_table
       ) w
on t.worker = w.worker
join (select project, sum(contribution) as total
  from my_table group by project) p
on t.project = p.project

答案 1 :(得分:1)

我不确定这在Hive中是否有效,但这是一个SQL解决方案:

select w.worker, avg(coalesce(t.relative, 0.0)) as avg_rel
from (select distinct project from my_table) p cross join
     (select distinct worker from my_table) w left outer join
     my_table t
     on t.project = p.project and t.worker = w.worker
group by w.worker;