无法在Apache Pig中解析Over()

时间:2014-04-23 19:08:03

标签: hadoop apache-pig

在Pig中使用Over()时出现以下错误:

Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve Over using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

执行C:

的右括号时发生错误
A = load 'data/watch*.txt' as (id,ts,watch);
B= GROUP A BY id;
C= FOREACH B {
  C1 = ORDER A BY ts;
  GENERATE FLATTEN(Stitch(C1,Over(C1.watch,'lag',-1,0)));
}

在我看来Over()不包括在我的猪中,但我不确定为什么,因为我相信我的猪和hadoop版本应该是最新的。

$ pig -version
Apache Pig version 0.12.1-SNAPSHOT (rexported)
compiled Feb 19 2014, 16:31:42

$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar

非常感谢任何见解。我现在想知道我是否应该使用PiggyBank中的Over()UDF。

1 个答案:

答案 0 :(得分:5)

我相信Pig v12的内置插件没有OVER功能。你需要在piggybank中使用OVER函数。

REGISTER piggybank.jar
DEFINE Over org.apache.pig.piggybank.evaluation.Over();

A = load 'data/watch*.txt' as (id,ts,watch);
B= GROUP A BY id;
C= FOREACH B {
  C1 = ORDER A BY ts;
  GENERATE FLATTEN(Stitch(C1,Over(C1.watch,'lag',-1,0)));
}