I'm adding an Accumulator implementation to a Pig UDF, and I want to test it.
What is the shortest and simplest Pig script that will use the accumulator?
For simplicity's sake, assume that it will load a file with N integers, where N > pig.accumulative.batchsize so that the accumulate() method will be called more than once.
data = LOAD 'input' AS (val1:int);
output = ... (code which uses the UDF comes here)
STORE output INTO 'output';
答案 0 :(得分:0)
看起来这已经足够了:
data = LOAD 'input' AS (val1:int);
output = FOREACH (group d all) GENERATE ACCUMULATIVE_UDF(val1);
STORE output INTO 'output';