Hive Query数组作为字段

时间:2018-05-18 12:22:32

标签: hadoop hive hiveql

我有两个Hive表:

客户表:

id,name,salary 
1 ,John, 10000
2 ,Melissa, 5000

帐户表:

id,account_number,client_id
1 ,00920202, 1
2 ,00920203, 1
3 ,00920204, 1
4 ,00920205, 2
5 ,00920206, 2

我需要一个返回此结果的配置单元查询:

id,name,salary,accounts
1 ,John, 10000, {00920202, 00920203, 00920204}
2 ,Melissa, 5000, {00920205, 00920206}

提前致谢

1 个答案:

答案 0 :(得分:2)

如果您确定帐号是唯一的,请使用collect_set。否则使用select c.id,c.name,c.salary,collect_list(a.account_number) as all_accounts from client c join account a on a.client_id=c.id group by c.id,c.name,c.salary 可以消除重复。

numpy.vectorize