我的数据集如下
key,value
---------
key1|10
key1|20
key1|30
key2|50
key2|70
我需要使用max“value”列填充相同键的新列。
输出必须
key1|10|30
key1|20|30
key1|30|30
key2|50|70
key2|70|70
Below is the Pig script, but facing issues.
A = LOAD 'input.txt' using PigStorage('|');
B = foreach A generate $0,$1,min($1);
grunt> A = LOAD 'input.txt' using PigStorage('|');
grunt> B = foreach A generate $0,$1,max($1);
2017-05-26 06:48:02,347 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve max using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
答案 0 :(得分:0)
以下代码应该这样做。请务必先使用group
关系,然后才能使用MAX
,MIN
,AVG
等函数。
A = load 'file' using PigStorage(',') as (id: chararray, val: int);
B = GROUP A by id;
C = FOREACH B GENERATE FLATTEN(group), MAX(A.val) as (maxval: int);
D = JOIN A by id, C BY group;
E = FOREACH D generate A::id, A::val, C::maxval;
DUMP E;
运行这个,你应该得到:
(key1,30,30)
(key1,20,30)
(key1,10,30)
(key2,70,70)
(key2,50,70)