蜂巢中多列的max(value)

时间:2018-11-29 13:42:59

标签: hive hiveql

例如

ID   dt_col_1    dt_col_2    dt_col_3  
1    09-10-2018  08-10-2018  10-10-2018  
1    10-10-2018  null        11-10-2018  
1    11-10-2018  10-10-2018  12-10-2018  
2    null        08-10-2018  12-10-2018  
2    10-10-2018  13-10-2018  09-10-2018  

寻找:

ID   dt_col_1    dt_col_2    dt_col_3  
1    null        null        12-10-2018  
2    null        13-10-2018  null  

hive中有一个最大的函数,它可以从一行返回多列中的最大值,但是如上例所示,如何在多行的情况下应用相同的函数?

1 个答案:

答案 0 :(得分:0)

首先应用分组依据以获取每个id的最大日期,然后如果需要单独的列或用例函数,则应用最大函数。

create table test_stackof_greatest (id int, dt1 date, dt2 date, d3 date) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;

insert into test_stackof_greatest values (1, '2018-10-09', '2018-10-08', '2018-10-10');
insert into test_stackof_gereatest values (1, '2018-10-10', null, '2018-10-11'), (1, '2018-11-10', '2018-10-10', '2018-10-12');
 insert into test_stackof_gereatest values (2, null, '2018-10-08', '2018-10-12'), (2, '2018-10-10', '2018-10-13', '2018-10-09');

select id, case when dt1>dt2 and dt1>dt3 then dt1 else null end, case when dt2>dt1 and dt2>dt3 then dt2 else null end, case when dt3>dt2 and dt3>dt1 then dt3 else null end, greatest(dt1, dt2, dt3) from (select id, max(dt1) as dt1, max(dt2) as dt2, max(d3) as dt3 from test_stackof_gereatest group by id) a;

Output
OK
1       2018-11-10      NULL    NULL    2018-11-10
2       NULL    2018-10-13      NULL    2018-10-13
Time taken: 20.467 seconds, Fetched: 2 row(s)

希望这会有所帮助