PIG Latin中FLATTEN运算符的用途是什么?

时间:2015-02-12 07:59:36

标签: hadoop apache-pig

A =加载数据' as(x,y);

B =加载数据' as(x,z);

C = cogroup A by x,B by x;

D = foreach C产生展平(A),展平(b);

E = D组由A :: x

组成

在上述陈述中完成了什么以及我们在实时场景中使用扁平化的方式。

1 个答案:

答案 0 :(得分:-1)

A = load 'input1'   USING PigStorage(',') as (x, y);
(x,y) --> (1,2)(1,3)(2,3)
B = load 'input2'  USING PigStorage(',') as (x, z);`
(x,z) --> (1,4)(1,2)(3,2)*/
C = cogroup A by x, B by x;`

result:

(1,{(1,2),(1,3)},{(1,4),(1,2)})
(2,{(2,3)},{})
(3,{},{(3,2)})


D = foreach C generate group, flatten(A), flatten(B);`

when both bags flattened, the cross product of tuples are returned.  

result:
(1,1,2,1,4)
(1,1,2,1,2)
(1,1,3,1,4)
(1,1,3,1,2)  

E = group D by A::x`

here your are grouping with x column of relation A.

(1,1,2,1,4)    (1,1,2,1,2)    (1,1,3,1,4)    (1,1,3,1,2)