PIG UDF抛出错误

时间:2014-02-23 16:15:15

标签: hadoop user-defined-functions apache-pig

我在PIG脚本中收到错误。

PIG SCRIPT:

REGISTER /var/lib/hadoop-hdfs/udf.jar;

REGISTER /var/lib/hadoop-hdfs/udf2.jar;

INPUT_LINES = Load 'hdfs:/Inputdata/DATA_GOV_US_Farmers_Market_DataSet.csv' using PigStorage(',') AS (FMID:chararray, MarketName:chararray, Website:chararray, Street:chararray, City:chararray, County:chararray, State:chararray, Zip:chararray, Schedule:chararray, X:chararray, Y:chararray, Location:chararray, Credit:chararray, WIC:chararray, WICcash:chararray, SFMNP:chararray, SNAP:chararray, Bakedgoods:chararray, Cheese:chararray, Crafts:chararray, Flowers:chararray, Eggs:chararray, Seafood:chararray, Herbs:chararray, Vegetables:chararray, Honey:chararray, Jams:chararray, Maple:chararray, Meat:chararray, Nursery:chararray, Nuts:chararray, Plants:chararray, Poultry:chararray, Prepared:chararray, Soap:chararray, Trees:chararray, Wine:chararray);

FILTERED_COUNTY = FILTER INPUT_LINES BY County=='Los Angeles';

REQUIRED_COLUMNS = FOREACH FILTERED_COUNTY GENERATE FMID,MarketName,$12..;

PER = FOREACH REQUIRED_COLUMNS GENERATE FMID,MarketName,fm($2..) AS Percentage;

STATUS = FOREACH PER GENERATE FMID,MarketName,Percentage,status(Percentage) AS Stat;

UDF1:

import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class fm extends EvalFunc<Integer>
{
    String temp;
    int per;
    int count=0;
public Integer exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return -1;
try
{
    for(int i=0;i<25;i++)
    {
        if(input.get(i) == "" || input.get(i) == null)
            return -1;

        temp = (String)input.get(i);
        if(temp.equals("Y"))
            count++;
    }
    per =  count*4;
    count = 0;
    return per;
}
catch(Exception e)
{
throw new IOException("Caught exception processing input row ", e);
}
}
}

UDF2:

import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class status extends EvalFunc<String>
{
public String exec(Tuple input) throws IOException 
{
if (input == null || input.size() == 0)
return null;
try
{
String str = (String)input.get(0);
int i = Integer.parseInt(str);
if(i>=60)
    return "HIGH";
else if(i<=40)
    return "LOW";
else
    return "MEDIUM";
}
catch(Exception e)
{
throw new IOException("Caught exception processing input row ", e);
}
}
}

数据集:

https://onedrive.live.com/redir?resid=7F81451078F4DBE8%21113

错误:

猪堆痕迹

ERROR 2078: Caught error from UDF: status [Caught exception processing input row ]

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias STATUS. Backend error : Caught error from UDF: status [Caught exception processing input row ]
    at org.apache.pig.PigServer.openIterator(PigServer.java:828)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:538)
    at org.apache.pig.Main.main(Main.java:157)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: status [Caught exception processing input row ]
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:365)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)

1 个答案:

答案 0 :(得分:3)

看来您的问题可能是您正在将输入转换为状态UDF中的String。你的fm UDF实际上返回一个Integer。所以你应该有:

Integer i = (Integer)input.get(0);

除非你修复它,否则这肯定会引起问题。如果没有原始错误消息,我无法说明之前是否存在其他问题。

我原本希望您的堆栈跟踪包含原始异常消息,这可以帮助您调试此问题。奇怪的是它没有。没有它,你只需要分析代码。

这可能有助于将来进行调试:

throw new IOException("Caught exception processing input row " + e.getMessage(), e);

对于fm UDF,我还建议为exec方法设置变量temp,per和count本地而不是类的实例,因为它们不需要。这可能不会导致错误,但它是更好的编码实践。