猪脚本中的数据规范化

时间:2012-10-12 21:34:25

标签: apache-pig

我有以下数据集:

1,11,ab;cd;200

2,22,pq;rs

我想要输出:

1,11,ab

1,11,cd

1,11,200

2,22,pq

2,22,rs

如果不使用任何udf,如何在Pig中完成?

2 个答案:

答案 0 :(得分:0)

您可以这样做:

A = load '....' using PigStorage(',') as (x,y,data : chararray);
SPLT = foreach A generate x, y, FLATTEN(STRSPLIT(data,';'));
X_tmp = foreach SPLT generate $0 as x, $1 as y, FLATTEN(TOBAG($2..$20)) as term; -- pivots the row
X = filter X_tmp by term is not null; -- this removes the extra bag rows when title was split in less than 20 terms

假设您在数据字符串中不会有超过20个元素。如果你有更多,而不是增加它。

答案 1 :(得分:0)

试试这个

CDN s