使用Pig Latin进行列到行的转换

时间:2016-01-26 09:09:01

标签: hadoop apache-pig bigdata

A = load 'input.txt'; 
dump A;
"0,1, 2,3,4 
5, 6,7, 8,9
B = foreach A generate FLATTEN(TOBAG(*));
dump B
("0)
(1)
( 2)
(3)
(4)
(5)
( 6)
(7)
( 8)
(9)

我想在上面的每个字段上执行一些替换和修剪操作。如何将其转换回原始格式?

预期输出

0,1,2,3,4

5,6,7,8,9

2 个答案:

答案 0 :(得分:0)

是的,这确实是一个实验性问题。

列转换为列,列转换为行转换!!

是的,通过RANK运营商的帮助,我想我们可以实现这个目标

我为以下输入尝试了以下代码

输入:

 0,1,2,3,4 
 5,6,7,8,9

在Pig Pig脚本中有两个转储语句

numbers = LOAD '/home/inputfiles/col_to_row.txt' USING PigStorage() As(line:chararray);

numbers_rank = RANK numbers;

numbers_each = FOREACH numbers_rank GENERATE  $0 as rank_key,FLATTEN(TOKENIZE(line)) as each_number;

rows_to_columns = FOREACH numbers_each GENERATE each_number;

dump rows_to_columns;--Will give you each number in a separate row..


numbers_grp = GROUP numbers_each BY rank_key;

columns_to_rows = FOREACH numbers_grp GENERATE FLATTEN(BagToTuple(numbers_each.each_number));

dump columns_to_rows; -- Will give you as Per original input data set

输出:

   dump rows_to_columns;

         (0)
         (1)
         (2)
         (3)
         (4)
         (5)
         (6)
         (7)
         (8)
         (9)


   dump columns_to_rows;

         (0,1,2,3,4)
         (5,6,7,8,9)

答案 1 :(得分:0)

您可以使用正则表达式进行简单替换。由于questiontomessagebox = ("You have indicated that " & Worksheets("dept 1 input").Range("g12") & " ,worked at " _ & Worksheets("dept 1 input").Range("g16") & " for " & Worksheets("dept 1 input").Range("g16")) & Chr(32) & _ vbInformation & vbNewLine & " Are you sure that this data is correct?" 函数调用java REPLACE,因此您可以使用java兼容的正则表达式。这是演示:

String.replaceAll()