如何在皮球中使用过功能

时间:2014-12-11 02:24:33

标签: apache-pig window-functions cumulative-sum

我是Apache pig的新手,无法弄清楚使用piggybank的Over函数进行累积计算有什么问题。我希望每个时期的累积工资给出以下数据的相同业务和位置:

business|location|period|salary
--------+--------+------+-------
100     |  East  |   1  |  100
100     |  East  |   1  |  55
100     |  East  |   2  |  100
100     |  East  |   3  |  150
100     |  West  |   1  |  150
100     |  West  |   2  |  200
100     |  West  |   3  |  250
200     |  East  |   1  |  50
200     |  East  |   2  |  50
200     |  East  |   3  |  50
200     |  West  |   1  |  80
200     |  West  |   2  |  100
200     |  West  |   3  |  120

我正在寻找的结果是:

business|location|period|cumulative salary
--------+--------+------+---------------
  100   |  East  |  1   |    155
  100   |  East  |  2   |    255
  100   |  East  |  3   |    405
  100   |  West  |  1   |    150
  100   |  West  |  2   |    350
  100   |  West  |  3   |    600
  200   |  East  |  1   |    50
  200   |  East  |  2   |    100
  200   |  East  |  3   |    150
  200   |  West  |  1   |    80
  200   |  West  |  2   |    180
  200   |  West  |  3   |    300

根据这篇Over doc,我应该可以通过

来完成
REGISTER /opt/mapr/pig/pig-0.12/contrib/piggybank/java/piggybank.jar;
A = LOAD '/user/sliang/pig/testData' USING PigStorage(',') as (business:long, location:chararray, period:long, salary:long);
B = group A by (business, location);
C = foreach B {
    C1 = order A by period;
    generate flatten(Stitch(C1, Over(C1.salary, 'sum(long)')));
};
D = foreach C generate business, location, period, $9;

但我在C:

开始出错
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve Stitch using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

我用谷歌搜索但没有太多关于此的信息...我还检查了罐子与其他储钱功能,它的工作原理,所以我想这不是因为皮卡没有正确注册。我正在使用猪0.12版本。

非常感谢任何帮助。谢谢!

1 个答案:

答案 0 :(得分:4)

使用Stitch和超过command的完整包路径。

即,将Stitch替换为org.apache.pig.piggybank.evaluation.Stitch和           Overorg.apache.pig.piggybank.evaluation.Over

。{

如果你想在你的猪脚本中避免使用上面冗长的包名,那么就定义你自己的宏,并在你的猪脚本中使用它。

DEFINE MYOVER org.apache.pig.piggybank.evaluation.Over;  

DEFINE MYSTITCH org.apache.pig.piggybank.evaluation.Stitch;  

更新了Pigscript:

A =  LOAD '/user/sliang/pig/testData' USING PigStorage(',') as (business:long, location:chararray, period:long, salary:long);
B = group A by (business, location);
C = foreach B {
    C1 = order A by period;
    generate flatten(org.apache.pig.piggybank.evaluation.Stitch(C1, org.apache.pig.piggybank.evaluation.Over(C1.salary, 'sum(long)')));
};
D = foreach C generate business, location, period, $4;

E = RANK D;
F = GROUP E BY (stitched::business,stitched::location,stitched::period);
G = FOREACH F {
                 sortRankByDesc = ORDER E BY rank_D DESC;
                 topRank = LIMIT sortRankByDesc 1;
                 GENERATE FLATTEN(topRank);
              }
H = FOREACH G GENERATE $1 AS business,$2 AS location,$3 AS period,$4 AS salary;
DUMP H;

<强>输出

(100,East,1,155)
(100,East,2,255)
(100,East,3,405)
(100,West,1,150)
(100,West,2,350)
(100,West,3,600)
(200,East,1,50)
(200,East,2,100)
(200,East,3,150)
(200,West,1,80)
(200,West,2,180)
(200,West,3,300)