Pig脚本在加入和分组后合并行

时间:2016-10-06 12:09:50

标签: merge dataset apache-pig rows movie

电影表

id  movie  genre
1   ABC    A|B|C
2   DEF    D|A|F

有多种类型由|分隔符分隔。

评分表:

user_id  movie_id  rating
1        1         3.5
1        2         4.5

结果:

我希望结果为user_id +所有类型

user_id  genres
1        (A|B|C|D|A|F)

代码:

genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by (user_id);
user1_data = foreach genre_data generate ratings::user_id, movie::genre;

1 个答案:

答案 0 :(得分:1)

您可以通过以下方式实现:

genre_data = join movie by id, ratings by movie_id;
genre_data = group genre_data by user_id;

user_data = foreach genre_data {
    genres = foreach genre_data generate movie::genre as genres;
    generate group as user_id, BagToString(genres, '|');
};