保持Hive记录的顺序收集

时间:2014-07-17 13:40:29

标签: hive hiveql

我有一个HIVE表格如下:

select id, id_2, val from test order by id;

234 974 0.5
234 457 0.7
234 236 0.5
234 859 0.6
123 859 0.7
123 236 0.6
123 974 0.5
123 457 0.5

我正在根据collect值尝试id数据。我需要收集的数据遵循每行的相同顺序。我的预期输出如下:(任何订单都很好,只要它对所有行都相同):

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [974,457,236,859]   [0.5,0.5,0.6,0.7]

我使用了来自Brickhousecollect UDF。

select tmp.id, collect(id_2), collect(tmp.val) from
(select id, id_2, val from test
order by id) tmp
group by tmp.id
;

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [859,236,974,457]   [0.7,0.6,0.5,0.5]

如您所见,未保留列的顺序。有没有办法在整个输出中保持排序不变?任何提示将不胜感激。

2 个答案:

答案 0 :(得分:2)

使用此查询

select tmp.id, collect(id_2), collect(tmp.val) from
(select id, id_2, val from test
order by id desc, id_2 desc) tmp
group by tmp.id
;

输出如下,

234 [974,457,236,859]   [0.5,0.7,0.5,0.6]
123 [974,457,236,859]   [0.5,0.5,0.6,0.7]

基本修改

order by id

   order by id desc, id_2 desc

答案 1 :(得分:-2)

使用sort_array(collect_list(cols));