Question

I'm adding an Accumulator implementation to a Pig UDF, and I want to test it.

What is the shortest and simplest Pig script that will use the accumulator?

For simplicity's sake, assume that it will load a file with N integers, where N > pig.accumulative.batchsize so that the accumulate() method will be called more than once.

data = LOAD 'input' AS (val1:int);

output = ... (code which uses the UDF comes here)

STORE output INTO 'output';

Answer 1

看起来这已经足够了：

data = LOAD 'input' AS (val1:int);

output = FOREACH (group d all) GENERATE ACCUMULATIVE_UDF(val1);

STORE output INTO 'output';

Shortest Pig script that will use Accumulator

1 个答案: