Pig:如何用可变数量的元素解析元组?

时间:2015-03-22 21:15:11

标签: apache-pig

这是我的输出文件,我用另一个Pig脚本写了:

1   3,5 
2   4,6,7

我试图将每一行解析为(chararray,tuple)

data = load 'test45'  as (x:chararray, y:tuple());

但是当我试图抛弃元组时,它们会被清空:

rows = foreach data generate y;

() ()

1 个答案:

答案 0 :(得分:1)

试试这个。

   X = LOAD 'pigtuple.txt' AS (str:chararray);

   X1 = FOREACH X GENERATE FLATTEN(STRSPLIT(str, '\\s+')) AS (id:int, attr:chararray);

   X3 = FOREACH X1 GENERATE id, STRSPLIT(attr, ',') AS (y:tuple());

   X4 = foreach X3 GENERATE id,y;

   dump X4;

如果你想访问元组中的每个元素。

   X4 = foreach X3 GENERATE y.$0,y.$1;