使用Pig将最大值填充到相同键的相邻记录

时间:2017-05-26 06:42:23

标签: hadoop apache-pig

我的数据集如下

key,value
---------
key1|10
key1|20
key1|30
key2|50
key2|70

我需要使用max“value”列填充相同键的新列。

输出必须

key1|10|30
key1|20|30
key1|30|30
key2|50|70
key2|70|70

Below is the Pig script, but facing issues.
A = LOAD 'input.txt' using PigStorage('|');
B = foreach A generate $0,$1,min($1); 


grunt> A = LOAD 'input.txt' using PigStorage('|');
grunt> B = foreach A generate $0,$1,max($1);

2017-05-26 06:48:02,347 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve max using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

1 个答案:

答案 0 :(得分:0)

以下代码应该这样做。请务必先使用group关系,然后才能使用MAXMINAVG等函数。

A = load 'file' using PigStorage(',') as (id: chararray, val: int);
B = GROUP A by id;
C = FOREACH B GENERATE FLATTEN(group), MAX(A.val) as (maxval: int);
D = JOIN A by id, C BY group;
E = FOREACH D generate A::id, A::val, C::maxval;
DUMP E;

运行这个,你应该得到:

(key1,30,30)
(key1,20,30)
(key1,10,30)
(key2,70,70)
(key2,50,70)