如何在Hive中保留最近的记录?

时间:2018-05-08 12:39:17

标签: hive

我有一个逻辑,通过它我可以加入多个hive表来生成以下输出。

但是,我需要一些帮助。对于相同的状态ID(即5或17),我只想保留Ser NO值最小的记录。

但是,这里的问题是,如果在状态更新后状态ID重复,(例如,状态ID 17再次出现在记录13中 - 应保留,因为它再次重新启动并返回)。

因此,如果按日期和时间以及状态排序并删除重复项,则无法满足我的目的。

我需要设置一个循环并检查状态ID是否与之前的记录相比发生了变化,如果状态ID相同,则过滤掉记录。

预期输出应为:

    Ser_NO   ID    ID_NO    STATUS  DESCRIPTION             initiated_dt    time
    1        100    10      5       Initiated               20180426        000601
    3        100    10      15      BM(O) review            20180426        021424
    4        100    10      17    BM(O) & SME Review        20180426        021552
    7        100    10      40  Pending BSDA First Approval 20180426        021810
    8        100    10      25  Pending Controller approval 20180426        021844
    9        100    10      55  Booking SDA Completed       20180426        021917
    11       100    10      4   Re-Initiated                20180426        021944
    12       100    10      15  BM(O) review                20180426        030648
    13       100    10      17  BM(O) & SME Review          20180426        030714
    14       100    10      40  Pending BSDA First Approval 20180426        030734
    16       100    10      25  Pending Controller approval 20180426    030805
    17       100    10      55  Booking SDA Completed       20180426    030837
    24       100    10      60  Shipping SDA Completed      20180426    031056
    25       100    10      55  Booking SDA Completed       20180426    031124

但我想知道是否有更简单的方法来实现这一目标?

Ser_NO   ID    ID_NO    STATUS  DESCRIPTION             initiated_dt    time
1        100    10      5       Initiated               20180426        000601
2        100    10      5       Initiated               20180426        021408
3        100    10      15      BM(O) review            20180426        021424
4        100    10      17    BM(O) & SME Review        20180426        021552
5        100    10      17    BM(O) & SME Review        20180426        021621
6        100    10      17    BM(O) & SME Review        20180426        021639
7        100    10      40  Pending BSDA First Approval 20180426        021810
8        100    10      25  Pending Controller approval 20180426        021844
9        100    10      55  Booking SDA Completed       20180426        021917
10       100    10      55  Booking SDA Completed       20180426        021917
11       100    10      4   Re-Initiated                20180426        021944
12       100    10      15  BM(O) review                20180426        030648
13       100    10      17  BM(O) & SME Review          20180426        030714
14       100    10      40  Pending BSDA First Approval 20180426        030734
15       100    10      40  Pending BSDA First Approval 20180426    030805
16       100    10      25  Pending Controller approval 20180426    030805
17       100    10      55  Booking SDA Completed       20180426    030837
18       100    10      55  Booking SDA Completed       20180426    030837
24       100    10      60  Shipping SDA Completed      20180426    031056
25       100    10      55  Booking SDA Completed       20180426    031124

0 个答案:

没有答案