Question

我正在使用Hive。假设我有一个列Log（userID，time，describe）我想选择userID，min(time)，按用户标识从日志组中进行描述。我有什么方法可以用Hive一步完成吗？当我考虑mapreduce <key, value>算法方面时，它看起来很简单，但我有这么多文件，还有另一个步骤，所以这导致了我的问题。

示例：

(userID, time, describe) = {(1, 2, 2), (2, 3, 3), (1, 1, 1), (1, 3, 3), (2, 1, 1)}

我的期望：

(userID, time, describe) = {(1, 1, 1), (2, 1, 1)}

Answer 1

可以在CTE和Window function排名的情况下完成

即

    with result as
    (
        select userID, 
              time,
              describe ,
              rank() OVER (PARTITION BY userID ORDER BY time asc)  as rnk
        from Log
    )
    select userID,
       time, 
       describe
    from result where rnk =1

HIVE获得Min 1专栏并保留所有其他专栏

1 个答案: