我有如下数据集。我想根据特定列上的组(或其他一些函数)获取某些列值。 我的数据集如下:
id zip Action content duration OS TIME
================================================
1 11 START DELL LINUX 12
1 11 JUMP HP UNIX 14
1 11 STOP HP 10 LINUX 16
1 11 START WIN LINUX 2
1 11 JUMP HP UNIX 4
1 11 STOP SONY 12 LINUX 15
2 12 START HP UNIX 3
2 12 STOP FOP 2 WINDOWS 10
--------------------------------------------
我希望获得基于相同(id,zip)组的所有列值,其中Action =' STOP'过滤记录的最大和最大时间。 我的预期输出是:
id zip Action content duration OS
========================================
1 11 STOP HP 10 LINUX
2 12 STOP FOP 2 WINDOWS
--------------------------------------------
我如何使用HIVE实现同样的目标? 请帮忙。
答案 0 :(得分:1)
<强> ROW_NUMBER 强>
select id,zip,Action,content,duration,OS
from (select *
,row_number() over
(
partition by id,zip
order by time desc
) as rn
from mytable
where action = 'STOP'
) t
where rn = 1
+----+-----+--------+---------+----------+---------+
| id | zip | action | content | duration | os |
+----+-----+--------+---------+----------+---------+
| 1 | 11 | STOP | HP | 10 | LINUX |
| 2 | 12 | STOP | FOP | 2 | WINDOWS |
+----+-----+--------+---------+----------+---------+