apache pig,按id分组的值的排列

时间:2017-03-31 22:19:39

标签: apache-pig permutation

使用Apache Pig,我需要一个字段的所有排列,按id字段分组(在这种情况下为' title')。输入数据如下所示:

架构是{chararray,chararray}

(title1, name1)
(title1, name2)
(title1, name3)
(title2, name4)
(title2, name5)
(title2, name6)

我需要在一个列表中对title1名称关系和title2名称关系进行所有排列。期望的输出是:

(name1, name2)
(name1, name3)
(name2, name3)
(name4, name5)
(name4, name6)
(name5, name6)

我找到了相关的答案How To Find All Possible Permutations From A Bag under apache pig,但我在扩展解决方案方面遇到了困难,因此限制了每个标题字段的排列。

1 个答案:

答案 0 :(得分:0)

进行更多搜索后,使用以下两个帖子: How To Find All Possible Permutations From A Bag under apache pigPIG: Get all tuples out of a grouped bag让我得到了这个解决方案:

输入架构是{chararray,chararray}

inpt = foreach input generate $0 as (id:chararray), $1 as (val);
grp = group inpt by (id);
id_grp = foreach grp generate group as id, inpt.val as value_bag;
result = foreach id_grp generate FLATTEN(value_bag) as v1,FLATTEN(value_bag) as v2; 
result = filter result by v1 <= v2;
result = filter result by v1 != v2;