配置单元查询中的串联

时间:2020-08-19 17:05:35

标签: sql hadoop hive hiveql

我有一个蜂巢表

col1   col2
1     ["apple", "orange"]
1     ["orange", "banana"]
1     ["mango"]
2     ["apple"]
2     ["apple", "orange"]

有数据类型

col1 int
col2 array<string>

我想查询类似的内容:

select col1, concat(col2) from table group by col1;

输出应为:

1    ["apple", "orange", "banana", "mango"]
2    ["apple", "orange"]

蜂巢中有执行此操作的功能吗?

我也将此数据写入csv,当我将其作为数据帧读取时,我得到的col2 dtype为object。有没有办法将其输出为array

1 个答案:

答案 0 :(得分:1)

尝试展开数组,然后通过按collect_set分组使用 col1 函数。

Example:

Input:

select * from table;
OK
dd.col1 dd.col2
1       ["apple","orange"]
1       ["mango"]
1       ["orange","banana"]

select col1,collect_set(tt1)col2 from (
   select * from table lateral view explode(col2) tt as tt1
)cc 
group by col1;

Output:

col1    col2
1       ["apple","orange","mango","banana"]