我有一个蜂巢表
col1 col2
1 ["apple", "orange"]
1 ["orange", "banana"]
1 ["mango"]
2 ["apple"]
2 ["apple", "orange"]
有数据类型
col1 int
col2 array<string>
我想查询类似的内容:
select col1, concat(col2) from table group by col1;
输出应为:
1 ["apple", "orange", "banana", "mango"]
2 ["apple", "orange"]
蜂巢中有执行此操作的功能吗?
我也将此数据写入csv,当我将其作为数据帧读取时,我得到的col2 dtype为object
。有没有办法将其输出为array
。
答案 0 :(得分:1)
尝试展开数组,然后通过按collect_set
分组使用 col1
函数。
Example:
Input:
select * from table;
OK
dd.col1 dd.col2
1 ["apple","orange"]
1 ["mango"]
1 ["orange","banana"]
select col1,collect_set(tt1)col2 from (
select * from table lateral view explode(col2) tt as tt1
)cc
group by col1;
Output:
col1 col2
1 ["apple","orange","mango","banana"]