如何从PIG中的不同表中求和两个字段?

时间:2014-06-12 15:06:21

标签: sum field apache-pig

表示例:

A = LOAD 'data' AS (a1:int,a2:int);

DUMP A;
(1,2)
(1,3)
(2,2)
(3,4)
(3,1)

我得到了

A2 = GROUP A BY a1;

DUMP A2;
(1,{(1,2),(1,3)})
(2,{(2,2)})
(3,{(3,4),(3,1)})

B = LOAD 'data2' AS (b1:int,b2:int);
(1,4)
(2,3)
(3,2)

我想要的结果是

(1,{(1,6),(1,7)})
(2,{(2,5)})
(3,{(3,6),(3,3)})

即,

FOREACH A2 GENERATE group,A.a2+B.b2 

WHERE A.a1 == B.b1,但错误显示:

Invalid scalar projection: B

任何想法都会很棒,谢谢。

1 个答案:

答案 0 :(得分:1)

您可能必须先加入,然后添加,然后按分组进行。

joined_data = JOIN A by a1, B by b1;
summed_data = FOREACH joined_data GENERATE a1 as a1,a2+b2 as sum;
final_answer = GROUP summed_data by a1;