我有一个名为stuff的关系,其架构如下:
grunt> describe stuff;
stuff: {child_id: long,parent_id: long}
我想通过parent_id对child_ids进行分组,然后将输出生成为不同child_id之间的有序对列表。例如,如果parent_id为100,则child_id为1,2,4和5,我想要的输出是这样的:
1,2
1,4
1,5
2,1
2,4
2,5
4,1
4,2
4,5
5,1
5,2
5,4
我是否必须编写评估函数?
答案 0 :(得分:1)
您需要CROSS运营商。这是一个例子:
<强> INPUT 强>
1,2
1,1
1,3
1,4
2,5
2,3
2,6
<强> CODE 强>
inpt = load 'parent_child.csv' using PigStorage(',') as (parent_id: long, child_id: long);
tmp = foreach inpt generate parent_id, child_id as b1, child_id as b2; -- needed to use CROSS in the nested FOREACH
parentGroup = group tmp by parent_id;
perms = foreach parentGroup {
bro_1 = tmp.b1;
bro_2 = tmp.b2;
brothers = cross bro_1, bro_2;
brothers = filter brothers by b1 != b2; -- remove relationship to itself
generate group as parent_id, brothers;
}
<强>输出强>
schema - perms:{parent_id:long,brothers:{(bro_1 :: b2:long,bro_2 :: b2:long)}}
(1,{(1,2),(3,2),(4,2),(2,1),(3,1),(4,1),(2,3),(1,3),(4,3),(2,4),(1,4),(3,4)})
(2,{(3,5),(6,5),(5,3),(6,3),(5,6),(3,6)})