例如
ID dt_col_1 dt_col_2 dt_col_3
1 09-10-2018 08-10-2018 10-10-2018
1 10-10-2018 null 11-10-2018
1 11-10-2018 10-10-2018 12-10-2018
2 null 08-10-2018 12-10-2018
2 10-10-2018 13-10-2018 09-10-2018
寻找:
ID dt_col_1 dt_col_2 dt_col_3
1 null null 12-10-2018
2 null 13-10-2018 null
hive中有一个最大的函数,它可以从一行返回多列中的最大值,但是如上例所示,如何在多行的情况下应用相同的函数?
答案 0 :(得分:0)
首先应用分组依据以获取每个id的最大日期,然后如果需要单独的列或用例函数,则应用最大函数。
create table test_stackof_greatest (id int, dt1 date, dt2 date, d3 date) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;
insert into test_stackof_greatest values (1, '2018-10-09', '2018-10-08', '2018-10-10');
insert into test_stackof_gereatest values (1, '2018-10-10', null, '2018-10-11'), (1, '2018-11-10', '2018-10-10', '2018-10-12');
insert into test_stackof_gereatest values (2, null, '2018-10-08', '2018-10-12'), (2, '2018-10-10', '2018-10-13', '2018-10-09');
select id, case when dt1>dt2 and dt1>dt3 then dt1 else null end, case when dt2>dt1 and dt2>dt3 then dt2 else null end, case when dt3>dt2 and dt3>dt1 then dt3 else null end, greatest(dt1, dt2, dt3) from (select id, max(dt1) as dt1, max(dt2) as dt2, max(d3) as dt3 from test_stackof_gereatest group by id) a;
Output
OK
1 2018-11-10 NULL NULL 2018-11-10
2 NULL 2018-10-13 NULL 2018-10-13
Time taken: 20.467 seconds, Fetched: 2 row(s)
希望这会有所帮助