非空列的平均函数 - Hive

时间:2018-02-09 18:48:35

标签: hadoop hive aggregate-functions hiveql

我想计算前3年收入的平均值,例如:

employee id    2016  2015 2014 2013  2012  2011  2010
      1         100  NULL 200   50   10     50    50

平均值应为100 + 200 + 50/3

employee id    2016  2015 2014 2013  2012   2011 2010
      2        NULL  100  NULL  50    NULL  25   100

平均值应为100 + 50 + 25/3

1 个答案:

答案 0 :(得分:1)

每年使用union all获取一行。然后使用row_number函数对行进行排名,以便将非空行排在第一位。然后得到前3行的平均值。

select employee_id,avg(income)
from (select employee_id,yr,income
      ,row_number() over(partition by employee_id order by cast((income is not null) as int) desc,yr desc) as rnum 
      from (select employee_id,2016 as yr,`2016` as income from tbl 
            union all
            select employee_id,2015 as yr,`2015` as income from tbl
            union all
            select employee_id,2014 as yr,`2014` as income from tbl
            union all
            select employee_id,2013 as yr,`2013` as income from tbl
            union all
            select employee_id,2012 as yr,`2012` as income from tbl
            union all
            select employee_id,2011 as yr,`2011` as income from tbl
            union all
            select employee_id,2010 as yr,`2010` as income from tbl
           ) t
      ) t
where rnum <= 3
group by employee_id
  • 当2列具有值时,结果将是(val1 + val2)/ 2。
  • 当只有一列有值时,结果就是该列。
  • 当所有列都有null值时,会返回null