我在hive表中有一个数据集
input1,input2,input_time
key1,val1,2017-02-03 00:00:00
key1,val1,2017-02-03 00:00:00
key1,val2,2017-02-03 00:00:00
key1,val2,2017-02-03 00:00:00
key2,val1,2017-02-03 00:00:00
列(input1,input2)将形成唯一的组合记录。对于相同的唯一组合,我想用秒增加input_time列,即“2017-02-03 00:00:01”。
对于相同的组合说我有65条记录,一旦第二次达到59秒,它应该增加(分钟+秒),即“2017-02-03 00:01:01”
我们如何增加相同记录组合的时间,是否可以在配置单元中使用?
Expected output:
input1,input2,input_time
key1,val1,2017-02-03 00:00:01
key1,val1,2017-02-03 00:00:02
key1,val2,2017-02-03 00:00:01
key1,val2,2017-02-03 00:00:02
key2,val1,2017-02-03 00:00:01
答案 0 :(得分:0)
您可以使用窗口函数为要添加的每一行生成临时索引。
select
k, v , unix_timestamp(ts) as ts,
row_number() over ( partition by k,v ) as rn
from ts_test
这将产生:
+----+----+----------+---+
| k| v| ts| rn|
+----+----+----------+---+
|key1|val1|1486101600| 1|
|key1|val1|1486101600| 2|
|key1|val2|1486101600| 1|
|key1|val2|1486101600| 2|
|key2|val1|1486101600| 1|
+----+----+----------+---+
现在您可以继续将其添加到您的时间字符串中,因为它已经是ISO格式。
SELECT a.k, a.v, from_unixtime(ts+rn) as newts from
( select k, v , unix_timestamp(ts) as ts, row_number() over ( partition by k,v ) as rn
from ts_test )a
+----+----+-------------------+
| k| v| newts|
+----+----+-------------------+
|key1|val1|2017-02-03 00:00:01|
|key1|val1|2017-02-03 00:00:02|
|key1|val2|2017-02-03 00:00:01|
|key1|val2|2017-02-03 00:00:02|
|key2|val1|2017-02-03 00:00:01|
+----+----+-------------------+
这也可以通过@DuduMarkovitz所说的单一选择来实现:
select
k, v ,
from_unixtime(unix_timestamp(ts) + row_number() over ( partition by k,v order by v asc ) )
from ts_test