我在SelectionNote提供的猪中使用java udf
-生成ENEE_ENR_GCP
ENEE_ENR_GCP = FOREACH (GROUP ENEE_ENR BY IDT_GCP)
{
ENEE_ENR_GROUP = ORDER ENEE_ENR BY IDT_GCP;
GENERATE
group AS IDT_GCP,
FLATTEN(SelectionNote(ENEE_ENR_GROUP)) AS CD_NOT;
};
但是似乎没有导入的输入数据
2019-07-26 10:39:13,271 INFO [IPC Server handler 26 on 35915] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1549794175705_2758259_r_000000_2: Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: com.arkea.sni.udf.SelectionNote [Caught exception processing input row null]
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:354)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDataBag(POUserFunc.java:370)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:335)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:405)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:322)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:262)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Caught exception processing input row null
at com.arkea.sni.udf.SelectionNote.exec(SelectionNote.java:59)
at com.arkea.sni.udf.SelectionNote.exec(SelectionNote.java:27)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:326)
输入数据似乎是:
011958029,00000024,,1,20100209,1
011951228,00000036,,1,20100209,1
011964431,00000814,,1,20100227,1
003526500,00000863,,1,20080122,1
011950864,00001478,,1,20100209,1
011999168,00002495,X0,1,20100331,0
001684881,00002641,,1,19861126,1
001677981,00003165,,1,19861119,1
001677457,00003311,,1,19870114,1
001677161,00003440,,1,19870116,1
获取输入的函数由给出:
@Override 公共DataBag exec(元组输入)抛出IOException {
try {
if (input.get(0) == null || input.size() == 0)
return null;
// Récupération de bag à partir de Script PIG
DataBag bagFromPigScript = (DataBag) input.get(0);
// Tableau permettant de sauvegarder les tuples sous format
// string
List<Personne> listPersonnes = new ArrayList<Personne>();
for (Tuple tuple : bagFromPigScript) {
listPersonnes.add(new Personne((String) tuple.get(1), (String) tuple.get(2), (String) tuple.get(3), (String) tuple.get(4), (String) tuple.get(5)));
}
Tuple returnTuple = TupleFactory.getInstance().newTuple();
List<Tuple> returnTupleList = new ArrayList<Tuple>();
returnTuple.append(SelectionCodeNote((ArrayList<Personne>) listPersonnes));
returnTupleList.add(returnTuple);
return BagFactory.getInstance().newDefaultBag(returnTupleList);
} catch (Exception e) {
throw new IOException("Caught exception processing input row " + e.getMessage());
}
}
为什么不能导入数据?
谢谢