Question

这是我的蜂巢表

       id                    name                   starttime(datatype string )

    0000031               workflows_status       Thu, 18 Feb 2016 14:21:38 GMT  
    0000030               workflows_status       Thu, 18 Feb 2016 14:16:28 GMT  
    0000029               workflows_status       Thu, 18 Feb 2016 14:07:25 GMT  
    0000336               hive_test              Tue, 16 Feb 2016 09:27:54 GMT  
    0000335               hive_test              Tue, 16 Feb 2016 09:17:52 GMT  
    0000334               hive_test              Tue, 16 Feb 2016 09:00:26 GMT

我希望hive查询获得以下结果

    id               name                   starttime

    0000031          workflow_status        Thu, 18 Feb 2016 14:21:38 GMT
    0000336          hive_test              Tue, 16 Feb 2016 09:27:54 GMT

Answer 1

您可以使用以下查询获得所需的输出：

select * from（select id，name，starttime，rank（）over（partition by 通过unix_timestamp命名顺序（starttime，＆＃39; EEE，dd MMM yyyy hh：mm：ss z＆＃39;） desc）作为rnk来自hive_table）a其中a.rnk = 1;

Answer 2

Hive允许通过Windowing and Analytics Functions进行此类操作。

将 RANK（）功能与 OVER 子句一起使用，可以达到理想的效果。 Over子句将按指定列名称的结果进行分组，然后Rank = 1将在每个组中获得第一个结果。这与oracle中的ROWNUM = 1类似。

select * from (
 select 
   id, 
   name, 
   starttime, 
   rank() over ( partition by name order by starttime) desc ) as rank_alias 
 from hive_table
) a where a.rank_alias = 1;

如何查找蜂巢中每个组的最新记录

2 个答案: