Hive逻辑获取最小时间,最大时间和其他列

时间:2017-07-25 09:13:06

标签: sql hive

我有格式数据

+---------------------+-------------------------+-------------------------+-----------+------+
|         id          |       start time        |        end time         | direction | name |
+---------------------+-------------------------+-------------------------+-----------+------+
| 9202340753368000000 | 2015-06-02 15:10:28.677 | 2015-06-02 15:32:22.677 |         3 | xyz  |
| 9202340753368000000 | 2015-06-02 14:55:37.353 | 2015-06-02 15:12:18.84  |         1 | xyz  |
+---------------------+-------------------------+-------------------------+-----------+------+

我需要输出最小开始时间,最长结束时间,最小开始时间的方向值和名称

+---------------------+-------------------------+------------------------+-----------+------+
|         id          |       start time        |        end time        | direction | name |
+---------------------+-------------------------+------------------------+-----------+------+
| 9202340753368000000 | 2015-06-02 14:55:37.353 | 2015-06-02 15:32:22.677|         1 | xyz  |
+---------------------+-------------------------+------------------------+-----------+------+

我尝试过使用

select x.id, min(x.start_time) as mintime, max(x.end_time) maxtime , y.direction, y.name   
 from dir_samp x inner join ( 
 select id, start_time,  end_time, name, direction ,  
   rank() over ( partition by id
                order by start_time asc) as r 
   from dir_samp 
) y  on x.id = y.id  where y.r = 1 group by x.id , y.direction, y.name

如果还有其他更有效的逻辑?请提供。

由于

2 个答案:

答案 0 :(得分:1)

您不需要内部联接:

select y.id, min(y.start_time) as mintime, 
       max(y.end_time) maxtime , 
       max(case when y.r=1 then y.direction end) as direction, 
       max(case when y.r=1 then y.name end) as name 
from
( 
 select id, start_time,  end_time, name, direction ,  
   rank() over ( partition by id order by start_time asc) as r 
   from dir_samp 
) y 
group by y.id;

答案 1 :(得分:1)

select      id
           ,min_vals.start_time
           ,end_time
           ,min_vals.direction
           ,min_vals.name

from       (select      id  
                       ,min(named_struct('start_time',start_time,'direction',direction,'name',name)) as min_vals
                       ,max(end_time)                                                                as end_time

            from        dir_samp

            group by    id
            ) t
;
+---------------------+----------------------------+----------------------------+-----------+------+
| id                  | start_time                 | end_time                   | direction | name |
+---------------------+----------------------------+----------------------------+-----------+------+
| 9202340753368000000 | 2015-06-02 14:55:37.353000 | 2015-06-02 15:32:22.677000 | 1         | xyz  |
+---------------------+----------------------------+----------------------------+-----------+------+