在HIVE中按字段分组以使用Hive获取所有列

时间:2017-06-10 14:09:03

标签: sql hive hiveql

我有如下数据集。我想根据特定列上的组(或其他一些函数)获取某些列值。 我的数据集如下:

id  zip  Action  content  duration  OS    TIME
================================================
1  11    START   DELL               LINUX   12
1  11    JUMP    HP                 UNIX    14
1  11    STOP    HP       10        LINUX   16
1  11    START   WIN               LINUX    2
1  11    JUMP    HP                 UNIX    4
1  11    STOP    SONY     12        LINUX   15
2  12    START   HP                 UNIX    3
2  12    STOP    FOP      2         WINDOWS 10
--------------------------------------------

我希望获得基于相同(id,zip)组的所有列值,其中Action =' STOP'过滤记录的最大和最大时间。 我的预期输出是:

id  zip  Action  content  duration  OS
========================================
1  11    STOP    HP       10        LINUX

2  12    STOP    FOP      2         WINDOWS
--------------------------------------------

我如何使用HIVE实现同样的目标? 请帮忙。

1 个答案:

答案 0 :(得分:1)

<强> ROW_NUMBER

select  id,zip,Action,content,duration,OS

from   (select  *
               ,row_number() over
                (
                    partition by    id,zip
                    order by        time desc
                )   as rn

        from    mytable

        where   action = 'STOP'
        ) t

where   rn = 1
+----+-----+--------+---------+----------+---------+
| id | zip | action | content | duration |   os    |
+----+-----+--------+---------+----------+---------+
|  1 |  11 | STOP   | HP      |       10 | LINUX   |
|  2 |  12 | STOP   | FOP     |        2 | WINDOWS |
+----+-----+--------+---------+----------+---------+