我在hive表中有数据集,如下所示
date col1 col2 col3
2016-02-01 A X 5
2016-02-03 A X 5
2016-02-04 A X 5
2016-03-01 A X 6
2016-04-01 A X 5
2016-04-02 A Y 5
2016-04-03 A Y 5
我必须对col 1和col2进行选择性分组,只要col3值发生变化就形成一个组,例如col3值在row4中从5变为6,我必须取日期列并得到min和超出它的最大值。 输出应该是这样的。
col1 col2 col3 minDate maxDate
A X 5 2016-02-01 2016-02-04
A X 6 2016-03-01 2016-03-01
A X 5 2016-04-01 2016-04-01
A Y 5 2016-04-02 2016-04-03
我确信col1和col2上的简单组不能正常工作。 任何人都可以建议一种方法来实现这一目标吗?
答案 0 :(得分:2)
select col1,col2,col3
,min(date) as minDate
,max(date) as maxDate
from (select *
,row_number () over
(
partition by col1,col2
order by date
) as rn_part_1_2
,row_number () over
(
partition by col1,col2,col3
order by date
) as rn_part_1_2_3
from mytable
) t
group by col1,col2,col3
,rn_part_1_2 - rn_part_1_2_3
order by col1,col2
,minDate
;
+------+------+------+------------+------------+
| col1 | col2 | col3 | mindate | maxdate |
+------+------+------+------------+------------+
| A | X | 5 | 2016-02-01 | 2016-02-04 |
| A | X | 6 | 2016-03-01 | 2016-03-01 |
| A | X | 5 | 2016-04-01 | 2016-04-01 |
| A | Y | 5 | 2016-04-02 | 2016-04-03 |
+------+------+------+------------+------------+