我有一个txt文件,其格式如下:
{ (word1),(word2),(word3),....,(wordn) }
单词不在引号中。我想使用apache pig并将此文件的格式更改为:
word1
word2
word3
wordn
有没有办法用apache猪这样做?
答案 0 :(得分:0)
你能试试吗?
<强>输入强>
{ (word1),(word2),(word3),(wordn) }
<强> PigScript1:强>
A = LOAD 'input' AS (mybag:{T:(line:chararray)});
B = FOREACH A GENERATE REPLACE(BagToString(mybag.line),'_',' ');
STORE B INTO 'output';
输出:(存储在输出/部分*文件中)
word1 word2 word3 wordn
更新(如果您想要单行中的所有列,请使用Flatten运算符) 的 PigScript2:强>
A = LOAD 'input' AS (mybag:{T:(line:chararray)});
B = FOREACH A GENERATE FLATTEN(mybag);
STORE B INTO 'output1';
<强>输出:强>
word1
word2
word3
wordn