在Pig中将一行转换为多行

时间:2014-12-24 17:18:23

标签: apache-pig

我想为下面的查询编写一个猪脚本。

输入是:

ABC,DEF,GHI,JKL,AAA,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,BBB,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3

输出应为:

ABC,DEF,GHI,JKL,AAA,aaa,1,2,3
ABC,DEF,GHI,JKL,AAA,bbb,1,2,3
ABC,DEF,GHI,JKL,AAA,ccc,1,2,3
ABC,DEF,GHI,JKL,BBB,aaa,1,2,3
ABC,DEF,GHI,JKL,BBB,bbb,1,2,3
ABC,DEF,GHI,JKL,BBB,ccc,1,2,3

有人可以帮助我吗?

1 个答案:

答案 0 :(得分:0)

您可以编写自己的自定义UDF或尝试以下方法

<强> input.txt中

ABC,DEF,GHI,JKL,AAA,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,BBB,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3,CCC,aaa,1,2,3,bbb,1,2,3,ccc,1,2,3

<强> PigScript:

A = LOAD 'input.txt' USING PigStorage(',');
B = FOREACH A GENERATE $0,$1,$2,$3,
                       FLATTEN(TOTUPLE($4)),
                       FLATTEN(TOBAG(
                                     TOTUPLE($5..$8),
                                     TOTUPLE($9..$12),
                                     TOTUPLE($13..$16)
                                    )
                              );
C = FOREACH A GENERATE $0,$1,$2,$3,
                       FLATTEN(TOTUPLE($17)),
                       FLATTEN(TOBAG(
                                     TOTUPLE($18..$21),
                                     TOTUPLE($22..$25),
                                     TOTUPLE($26..$29)
                                    )
                              );
D = UNION B,C;
DUMP D

<强>输出:

(ABC,DEF,GHI,JKL,AAA,aaa,1,2,3)
(ABC,DEF,GHI,JKL,AAA,bbb,1,2,3)
(ABC,DEF,GHI,JKL,AAA,ccc,1,2,3)
(ABC,DEF,GHI,JKL,BBB,aaa,1,2,3)
(ABC,DEF,GHI,JKL,BBB,bbb,1,2,3)
(ABC,DEF,GHI,JKL,BBB,ccc,1,2,3)