Hive:基于最大值的组列

时间:2017-04-06 15:41:43

标签: hadoop hive hiveql

我有一个字段为

的表格
table.setSurrendersFocusOnKeystroke( true );

...

我必须每年返回最大值 即,

date       value
10-02-1900 23
09-05-1901 22
10-03-1900 10
10-02-1901 24

我尝试了下面的查询但是错误了。

1900 23
1901 24

有人可以建议我这样做吗?

1 个答案:

答案 0 :(得分:1)

选项1

select      year(from_unixtime(unix_timestamp(date,'dd-MM-yyyy'))) as year
           ,max(value)                                             as max_value
from        t
group by    year(from_unixtime(unix_timestamp(date,'dd-MM-yyyy')))
;

选项2

pre Hive 2.2.0

set hive.groupby.orderby.position.alias=true;

从Hive 2.2.0开始

set hive.groupby.position.alias=true;
select      year(from_unixtime(unix_timestamp(date,'dd-MM-yyyy'))) as date
           ,max(value)
from        t
group by    1
;
+------+-----------+
| year | max_value |
+------+-----------+
| 1900 |        23 |
| 1901 |        24 |
+------+-----------+

P.S。

提取年份的另一种方法:

from_unixtime(unix_timestamp(date,'dd-MM-yyyy'),'yyyy')