如何使用Pig脚本计算两个字段的组合?

时间:2014-12-22 11:12:52

标签: hadoop apache-pig

我的输入如下:

(1, (a, b, c))
(2, (e, f, g))

我期望的输出是:

(1, a)
(1, b)
(1, c)
(2, e)
(2, f)
(2, g)

3 个答案:

答案 0 :(得分:0)

可能会对你有所帮助:

A = LOAD 'data' AS (int:a, t1:tuple(t1a:chararray, t1b:chararray,t1c:chararray));

B = FOREACH A GENERATE a,t1.$0,t1.$1,t1.$2;

C = group B by a;


X = COGROUP C BY a, C BY $0;

DUMP X;

答案 1 :(得分:0)

你能试试吗?

A = LOAD 'input.txt' USING PigStorage() AS (f1:int,T:tuple(f2:chararray,f3:chararray,f4:chararray));
B = FOREACH A GENERATE f1,FLATTEN(TOBAG(T.f2,T.f3,T.f4));
DUMP B;

答案 2 :(得分:0)

第1步:加载输入文件

1 a,b,c

2 e,f,g

作为

crude_input = load''使用PigStorage()AS(id:int,ip_tuple:tuple(val1:chararray,val2:chararray,val3:chararray));

dump crude_input;

(1,(A,B,C))

(2,(E,F,G))

第2步

crude_flatened = foreach crude_input GENERATE id,FLATTEN($ 1);

这将生成

(1,A,B,C)

(2,E,F,G)

第3步:

output_data = foreach crude_flatened生成id,FLATTEN(TOBAG(ip_tuple :: val1,ip_tuple :: val2,ip_tuple :: val3));

(1,A)

(1,b)中

(1,c)中

(2,E)

(2,F)

(2,G)