我有两个Hive表:
客户表:
id,name,salary
1 ,John, 10000
2 ,Melissa, 5000
帐户表:
id,account_number,client_id
1 ,00920202, 1
2 ,00920203, 1
3 ,00920204, 1
4 ,00920205, 2
5 ,00920206, 2
我需要一个返回此结果的配置单元查询:
id,name,salary,accounts
1 ,John, 10000, {00920202, 00920203, 00920204}
2 ,Melissa, 5000, {00920205, 00920206}
提前致谢
答案 0 :(得分:2)
如果您确定帐号是唯一的,请使用collect_set
。否则使用select c.id,c.name,c.salary,collect_list(a.account_number) as all_accounts
from client c
join account a on a.client_id=c.id
group by c.id,c.name,c.salary
可以消除重复。
numpy.vectorize