在Pig中运行UDF时出错

时间:2013-10-29 10:28:59

标签: java hadoop apache-pig

我正试图让一个UDF在猪身上运行但是我遇到了一些问题,当我尝试运行猪脚本时它出错了说不能用值'null'实例化mathPow,如果有任何身体可以帮助那么大。

由于

猪脚本如下:

REGISTER MathPower.jar
A = load ‘input’ using PigStorage(‘,’);
C = foreach A generate $0 as x, $1 as z;
B = foreach A generate powUDF.mathUDF(x, z);
dump B;

输入文件包含: 2,3 4,5

java如下,没有添加任何扩展库,我只是按照教程。我正在使用java版本1.6和eclipse:

package powUDF;

import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.commons.logging.Log;
import org.apache.*;


public class mathUDF extends EvalFunc<Long> {

/**
* A simple UDF that takes a value and raises it to the power of a second
* value.  It can be used in a Pig Latin script as Pow(x, y), where x and y
* are both expected to be ints.
*/


 public Long exec(Tuple input) throws IOException {
 try {
     /* Rather than give you explicit arguments, UDFs are always handed
      * a tuple.  The UDF must know the arguments it expects and pull
      * them out of the tuple.  These next two lines get the first and
      * second fields out of the input tuple that was handed in.  Since
      * Tuple.get returns Objects, we must cast them to Integers.  If
      * the case fails, an exception will be thrown.
      */
     int base = (Integer)input.get(0);
     int exponent = (Integer)input.get(1);
     long result = 1;

     /* Probably not the most efficient method...*/
     for (int i = 0; i < exponent; i++) {
         long preresult = result;
         result *= base;
         if (preresult > result) {
             // We overflowed.  Give a warning, but do not throw an
             // exception.
             warn("Overflow!", PigWarning.TOO_LARGE_FOR_INT);
             // Returning null will indicate to Pig that we failed but
             // we want to continue execution.
             return null;
         }
     }
     return result;
 } catch (Exception e) {
     // Throwing an exception will cause the task to fail.
     throw new IOException("Something bad happened!", e);
 }

} }

堆栈跟踪

Pig Stack Trace

ERROR 1200:无法使用参数'null'实例化'powUDF2.mathUDF'

Failed to parse: could not instantiate 'powUDF2.mathUDF' with arguments 'null'
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:193)
    at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1571)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1544)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991)
    at       org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:538)
    at org.apache.pig.Main.main(Main.java:157)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.RuntimeException: could not instantiate 'powUDF2.mathUDF' with arguments 'null'
    at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:618)
    at org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:193)
    at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264)
    at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143)
    at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:88)
    at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67)
    at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122)
    at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:246)
    at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
    at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114)
    at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76)
    at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.parser.LogicalPlanBuilder.expandAndResetVisitor(LogicalPlanBuilder.java:392)
    at org.apache.pig.parser.LogicalPlanBuilder.buildForeachOp(LogicalPlanBuilder.java:924)
    at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:14195)
    at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1623)
    at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
    at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
    at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
    ... 15 more
Caused by: java.lang.Error: Unresolved compilation problems:
    The type org.apache.commons.logging.Log cannot be resolved. It is indirectly referenced from required .class files
    The import org.apache.commons.logging.Log cannot be resolved
    The type org.apache.hadoop.io.WritableComparable cannot be resolved. It is indirectly referenced from required .class files
    PigWarning cannot be resolved to a variable

    at powUDF2.mathUDF.<init>(mathUDF.java:1)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at java.lang.Class.newInstance0(Class.java:355)
    at java.lang.Class.newInstance(Class.java:308)
    at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:588)
    ... 37 more
================================================================================

1 个答案:

答案 0 :(得分:1)

构建jar时,必须包含类引用的所有必需库。我使用Ant构建,以确保正确管理这样的依赖。尝试运行

jar -tf MathPower.jar

并查看您是否在任何地方看到了课程org/apache/commons/logging/Log。你的UDF导入了这个,但Pig无法找到它,如堆栈跟踪结束所示。同样,您似乎缺少与Hadoop交互所需的类:

The type org.apache.hadoop.io.WritableComparable cannot be resolved.

确保该类也包含在您构建的jar中。或者,您也可以REGISTER包含您要引用的类的jar,但我不确定它是否可行。