我在PIG脚本中收到错误。
PIG SCRIPT:
REGISTER /var/lib/hadoop-hdfs/udf.jar;
REGISTER /var/lib/hadoop-hdfs/udf2.jar;
INPUT_LINES = Load 'hdfs:/Inputdata/DATA_GOV_US_Farmers_Market_DataSet.csv' using PigStorage(',') AS (FMID:chararray, MarketName:chararray, Website:chararray, Street:chararray, City:chararray, County:chararray, State:chararray, Zip:chararray, Schedule:chararray, X:chararray, Y:chararray, Location:chararray, Credit:chararray, WIC:chararray, WICcash:chararray, SFMNP:chararray, SNAP:chararray, Bakedgoods:chararray, Cheese:chararray, Crafts:chararray, Flowers:chararray, Eggs:chararray, Seafood:chararray, Herbs:chararray, Vegetables:chararray, Honey:chararray, Jams:chararray, Maple:chararray, Meat:chararray, Nursery:chararray, Nuts:chararray, Plants:chararray, Poultry:chararray, Prepared:chararray, Soap:chararray, Trees:chararray, Wine:chararray);
FILTERED_COUNTY = FILTER INPUT_LINES BY County=='Los Angeles';
REQUIRED_COLUMNS = FOREACH FILTERED_COUNTY GENERATE FMID,MarketName,$12..;
PER = FOREACH REQUIRED_COLUMNS GENERATE FMID,MarketName,fm($2..) AS Percentage;
STATUS = FOREACH PER GENERATE FMID,MarketName,Percentage,status(Percentage) AS Stat;
UDF1:
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class fm extends EvalFunc<Integer>
{
String temp;
int per;
int count=0;
public Integer exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return -1;
try
{
for(int i=0;i<25;i++)
{
if(input.get(i) == "" || input.get(i) == null)
return -1;
temp = (String)input.get(i);
if(temp.equals("Y"))
count++;
}
per = count*4;
count = 0;
return per;
}
catch(Exception e)
{
throw new IOException("Caught exception processing input row ", e);
}
}
}
UDF2:
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class status extends EvalFunc<String>
{
public String exec(Tuple input) throws IOException
{
if (input == null || input.size() == 0)
return null;
try
{
String str = (String)input.get(0);
int i = Integer.parseInt(str);
if(i>=60)
return "HIGH";
else if(i<=40)
return "LOW";
else
return "MEDIUM";
}
catch(Exception e)
{
throw new IOException("Caught exception processing input row ", e);
}
}
}
数据集:
https://onedrive.live.com/redir?resid=7F81451078F4DBE8%21113
错误:
ERROR 2078: Caught error from UDF: status [Caught exception processing input row ]
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias STATUS. Backend error : Caught error from UDF: status [Caught exception processing input row ]
at org.apache.pig.PigServer.openIterator(PigServer.java:828)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: status [Caught exception processing input row ]
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:365)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
答案 0 :(得分:3)
看来您的问题可能是您正在将输入转换为状态UDF中的String。你的fm UDF实际上返回一个Integer。所以你应该有:
Integer i = (Integer)input.get(0);
除非你修复它,否则这肯定会引起问题。如果没有原始错误消息,我无法说明之前是否存在其他问题。
我原本希望您的堆栈跟踪包含原始异常消息,这可以帮助您调试此问题。奇怪的是它没有。没有它,你只需要分析代码。
这可能有助于将来进行调试:
throw new IOException("Caught exception processing input row " + e.getMessage(), e);
对于fm UDF,我还建议为exec方法设置变量temp,per和count本地而不是类的实例,因为它们不需要。这可能不会导致错误,但它是更好的编码实践。