在Hive中汇总ID属性值记录

时间:2019-07-12 18:21:11

标签: sql hive group-by

我有一个以下格式的表

ID  Property  Value

1    name      Tim

1    location  USA

1    age       30

2    name      Jack

2    location  UK

2    age       27

我想要以下格式的输出

ID  name  location age

1   Tim    USA     30

2   Jack   UK      27

在python中我可以做

table_agg = table.groupby('ID')[['Property','Value']].apply(lambda x: dict(x.values))

p = pd.DataFrame(list(table_agg))

如何在Hive中编写查询?

1 个答案:

答案 0 :(得分:2)

您可以使用collect_list,map函数对数据进行分组,然后根据密钥访问 array

示例:

hive> create table t1(id int,property string,valu string) stored as orc;
hive> insert into t1 values(1,"name","Tim"),(1,"location","USA"),(1,"age","30"),(2,"name","Jack"),(2,"location","UK"),(2,"age","27");

hive> select id,
       va[0]["name"]name,
       va[1]["location"]location,
       va[2]["age"]age 
      from (
           select id,collect_list(map(property,value))va 
               from <table_name> group by id
          )t;

结果:

id      name    location        age
1       Tim     USA             30
2       Jack    UK              27