我有以下数据我想获取每个ID的最新分区时间
ID time 12 10038446 201705102100 13 10038446 201706052100 14 10038446 201706060000 15 10038446 201706060100 16 10103517 201705101700 17 10103517 201705102100 18 10103517 201706052100 19 10103517 201706060100 20 10124464 201701310100 21 10124464 201702210500 22 10124464 201702220500 23 10124464 201703062100 24 10124464 201705102100 25 10124464 201706052100 26 10124464 201706060100
输出我期待如下
15 10038446 201706060100
19 10103517 201706060100
26 10124464 201706060100
37 1019933 201706052100
如何使用Hive查询实现此目的?
答案 0 :(得分:0)
试试这个
select ID, time
from
(
select
ID,
time,
row_number() over (partition by ID order by time desc) as time_rank
from table_name
) x
where time_rank = 1
group by ID, time
没有子查询(较低的hive版本),临时表是一个选项。
create table tmp_table as
select
ID,
time,
row_number() over (partition by ID order by time desc) as time_rank
from table_name;
select ID, time
from tmp_table
where time_rank = 1
group by ID, time;
drop table tmp_table;
答案 1 :(得分:0)
使用简单聚合:
select id, max(time) as time
from table
group by id
order by id; --order if necessary
使用您的数据集进行演示:
select id, max(time) as time
from
table
group by id
OK
10038446 201706060100
10103517 201706060100
10124464 201706060100
Time taken: 30.66 seconds, Fetched: 3 row(s)