我的猪脚本:
A = LOAD 'average.txt' as line;
B = FOREACH A GENERATE REGEX_EXTRACT_ALL(line,'^(.\*?)\\s+(.\*?)\\s+(.*?) AS TUPLE(AA:chararray,BB:chararray,CC:chararray);
C = FILTER B BY tuple_0.AA IS NOT NULL;
D = GROUP C BY $0.AA;
group stmt后的输出:
(1,{((1,a,b)),((1,c,d))})
(2,{((2,e,f)),((2,g,h))})
我需要这样的最终输出:
(1,a,b,c,d)
(2,e,f,g,h)
描述查询:
| D | group:chararray | C:bag{:tuple(tuple_0:tuple(AA:chararray,BB:chararray,CC:chararray))}
答案 0 :(得分:0)
我建议在C上进行自我加入,而不是按$ 0.AA分组:
A = LOAD 'average.txt' as line;
B = FOREACH A GENERATE REGEX_EXTRACT_ALL(line,'^(.\*?)\\s+(.\*?)\\s+(.*?) AS TUPLE(AA:chararray,BB:chararray,CC:chararray);
C = FILTER B BY tuple_0.AA IS NOT NULL;
C = FOREACH C GENERATE tuple_0.AA AS AA, tuple_0.BB AS BB, tuple_0.CC AS CC; --renaming columns to easy names
D = FOREACH C GENERATE AA, BB, CC; -- clone of C
CD = JOIN C BY AA, D BY AA;
CD2 = FOREACH CD
GENERATE
C::AA AS AA,
C::BB AS CBB,
C::CC AS CCC,
D::BB AS DBB,
D::CC AS DCC;
我希望这会有所帮助。