加入Hive

时间:2017-03-09 12:08:17

标签: hive hiveql

我在Hive中有一个表,其中包含以下列

userid                  string
attribute_name          string
attribute_value         string

attribute_name可以是年龄,性别等值。属性值是该名称的值,例如M表示性别。我想要的是一个表,它为每个userid提供了为特定attribute_name聚合的所有值。例如,如果这是一个样本表

userid    attribute_name    attribute_value
1000      gender            M
1000      city              Perth
1000      city              Singapore
1001      gender            F
1001      city              Tokyo
1001      gender            M
1002      city              Bombay

我想得到

1000      {M}     {Perth, Singapore}
1001      {F,M}   {Tokyo}

括号仅为清晰起见。

我可以获得两个单独的表,然后可能会进行连接,但是我试图一步完成

select userid, count (DISTINCT table.attribute_value) as numgender, collect_set(table.attribute_value) as genders                                                          

from table其中attribute_name ==“gender”GROUP BY table.userid

同样对于城市可以在一个查询中完成吗?

1 个答案:

答案 0 :(得分:2)

select      userid
           ,concat_ws(',',collect_list (case when attribute_name = 'gender' then attribute_value end)) as genders
           ,concat_ws(',',collect_list (case when attribute_name = 'city'   then attribute_value end)) as cities

from        mytable

group by    userid
;
+--------+---------+-----------------+
| userid | genders |     cities      |
+--------+---------+-----------------+
|   1000 | M       | Perth,Singapore |
|   1001 | F,M     | Tokyo           |
|   1002 |         | Bombay          |
+--------+---------+-----------------+

为了过滤掉没有性别的用户ID -

having count (case when attribute_name = 'gender' then 1 end) > 0