增量Hive中唯一键的列值

时间:2017-09-07 12:19:24

标签: hadoop hive

以下是数据。

col1,col2组合将给出唯一键。 col3有两种不同的时间戳格式,一种是TZ格式(2015-02-14T03:45:23.345Z),另一种是普通格式(2015-02-14 03:45:23)。

col1,col2,col3,col4
rank1,IP1,2015-02-14T03:45:23.345Z,2015-02-12 00:00:00Z
rank1,IP1,2015-02-14T03:45:23.145Z,2015-02-12 00:00:00Z
rank1,IP1,2015-02-14T03:45:23.465Z,2015-02-12 00:00:00Z
rank1,IP2,2015-02-14T03:45:23.345Z,2015-02-12 00:00:00Z
rank1,IP2,2015-02-14T03:45:23.125Z,2015-02-12 00:00:00Z
rank2,IP1,2015-02-14 03:44:11,2015-02-12 00:00:00Z
rank2,IP1,2015-02-14 03:45:23,2015-02-12 00:00:00Z
rank2,IP1,2015-02-14 03:45:23,2015-02-12 00:00:00Z

基于唯一键(col1,col2)组合,col3需要按升序排序。一旦col3处于升序状态,我们需要每隔1秒增加col4。

以下是我的查询。

select col1,col2,col3,CONCAT(from_unixtime(unix_timestamp(col4, "yyyy-MM-dd HH:mm:ss") + row_number() over 
(partition by col1,col2 order by from_unixtime(unix_timestamp(col3, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) asc) ),'Z') As col4
from ipconf;

我得到的输出不正确

rank1,IP1,2015-02-14T03:45:23.465Z,2015-02-12 00:00:01Z
rank1,IP1,2015-02-14T03:45:23.345Z,2015-02-12 00:00:02Z
rank1,IP1,2015-02-14T03:45:23.145Z,2015-02-12 00:00:03Z
rank1,IP2,2015-02-14T03:45:23.125Z,2015-02-12 00:00:01Z
rank1,IP2,2015-02-14T03:45:23.345Z,2015-02-12 00:00:02Z
rank2,IP1,2015-02-14 03:44:11,2015-02-12 00:00:01Z
rank2,IP1,2015-02-14 03:45:23,2015-02-12 00:00:02Z
rank2,IP1,2015-02-14 03:45:23,2015-02-12 00:00:03Z

预期产出:

rank1,IP1,2015-02-14T03:45:23.145Z,2015-02-12 00:00:01Z
rank1,IP1,2015-02-14T03:45:23.345Z,2015-02-12 00:00:02Z
rank1,IP1,2015-02-14T03:45:23.465Z,2015-02-12 00:00:03Z
rank1,IP2,2015-02-14T03:45:23.125Z,2015-02-12 00:00:01Z
rank1,IP2,2015-02-14T03:45:23.345Z,2015-02-12 00:00:02Z
rank2,IP1,2015-02-14 03:44:11,2015-02-12 00:00:01Z
rank2,IP1,2015-02-14 03:45:23,2015-02-12 00:00:02Z
rank2,IP1,2015-02-14 03:45:23,2015-02-12 00:00:03Z

1 个答案:

答案 0 :(得分:0)

以下查询适合您。

beta