我在Hive中有一个表,其中包含以下列
userid string
attribute_name string
attribute_value string
attribute_name
可以是年龄,性别等值。属性值是该名称的值,例如M表示性别。我想要的是一个表,它为每个userid提供了为特定attribute_name聚合的所有值。例如,如果这是一个样本表
userid attribute_name attribute_value
1000 gender M
1000 city Perth
1000 city Singapore
1001 gender F
1001 city Tokyo
1001 gender M
1002 city Bombay
我想得到
1000 {M} {Perth, Singapore}
1001 {F,M} {Tokyo}
括号仅为清晰起见。
我可以获得两个单独的表,然后可能会进行连接,但是我试图一步完成
select userid, count (DISTINCT table.attribute_value) as numgender, collect_set(table.attribute_value) as genders
from table其中attribute_name ==“gender”GROUP BY table.userid
同样对于城市可以在一个查询中完成吗?
答案 0 :(得分:2)
select userid
,concat_ws(',',collect_list (case when attribute_name = 'gender' then attribute_value end)) as genders
,concat_ws(',',collect_list (case when attribute_name = 'city' then attribute_value end)) as cities
from mytable
group by userid
;
+--------+---------+-----------------+
| userid | genders | cities |
+--------+---------+-----------------+
| 1000 | M | Perth,Singapore |
| 1001 | F,M | Tokyo |
| 1002 | | Bombay |
+--------+---------+-----------------+
为了过滤掉没有性别的用户ID -
having count (case when attribute_name = 'gender' then 1 end) > 0