按多列分组然后收集列表 - Hive

时间:2015-08-03 00:03:20

标签: hive

我有两个表,即帖子和标签。

posts:
account_id post_id
1 1
1 2
2 2
2 3
2 5
3 4

tags:
account_id post_id tag_id
1 1 21
1 2 22
1 2 26
2 2 28
2 3 23
2 3 24
2 3 25
2 5 27

现在,为了获得与帐户级别的帖子相关的所有标签,我已执行以下加入。

CREATE TABLE posts_tags1
AS
SELECT a.account_id, a.post_id, b.tag_id FROM posts a LEFT OUTER JOIN tags b ON a.account_id = b.account_id AND a.post_id = b.post_id;

我得到的结果是:

posts_tags1:
account_id post_id tag_id
1 1 21
1 2 22
1 2 26
2 2 28
2 3 23
2 3 24
2 3 25
2 5 27
3 4 NULL

现在我想将上述结果转换为

post_tags:
account_id post_id tag_ids
1 1 [21]
1 2 [22,26]
2 2 [28]
2 3 [23,24,25]
2 5 [27]
3 4 []

任何人都可以帮助我实现这个目标吗?

1 个答案:

答案 0 :(得分:0)

怎么样

CREATE TABLE post_tags 
AS 
SELECT account_id, post_id, collect_list(tag_id) FROM posts_tags GROUP BY account_id, post_id;