在HIVE中,使用COALESCE将Null值替换为相同的列值

时间:2018-08-13 10:24:02

标签: hive

我想用我想获得结果的同一列中的值替换特定列的空值

我在下面尝试过

select  
    d_day,
    COALESCE(val, LAST_VALUE(val, TRUE) 
    OVER( ORDER BY d_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) 
    as val from data_table

sample data

1 个答案:

答案 0 :(得分:0)

一种实现方法是借助两个窗口函数,下面是一个示例:

with tmp_table as (
  select 1 as ts, 3 as val 
  union all
  select 2 as ts, NULL as val
  union all 
  select 3 as ts, NULL as val
  union all
  select 4 as ts, 4 as val
  union all
  select 5 as ts, NULL as val
  union all
  select 6 as ts, 5 as val
  union all 
  select 7 as ts, 6 as val
)
, rank_table as ( 
select *, SUM(val) OVER (ORDER BY ts ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as rnk
  from tmp_table
)
select *, max(val) over (partition by rnk)
  from rank_table

所以就您而言

with rank_table as ( 
select *, SUM(val) OVER (ORDER BY d_day ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as rnk
  from your_table
)
select *, max(val) over (partition by rnk)
  from rank_table

请记住,第一个ORDER BY d_day将使您的作业在单个reducer上运行,因此,如果您的数据确实很大,则可能需要一些时间才能完成。