这是我工作的样本数据:
Peter Wilkerson 27 M
James Owen 26 M
Matt Wo 30 M
Kenny Chen 28 M
我创建了一个简单的UDF
来过滤这样的年龄:
public class IsApplicable extends FilterFunc {
@Override
public Boolean exec(Tuple tuple) throws IOException {
if(tuple == null || tuple.size() > 0){
return false;
}
try {
Object object = tuple.get(0);
if(object == null){
return false;
}
int age = (Integer)object;
return age > 28;
} catch (Exception e) {
throw new IOException(e);
}
}
}
这是我用于使用此UDF的脚本:
records = LOAD '~/Documents/data.txt' AS (firstname:chararray,lastname:chararray,age:int,gender:chararray);
filtered_records = FILTER records BY com.udf.IsApplicable(age);
dump filtered_records;
转储不显示任何记录。请告诉我错过的地方。
答案 0 :(得分:1)
tuple.size() > 0
条件always true
中的if stmt
条件,因此它永远不会转到try block(ie filtering logic)
,这就是您获得空结果的原因。你能改变像这样的if条件吗?
System.out.println("TupleSize="+tuple.size());
if(tuple == null || tuple.size() < 0){
return false;
}
控制台中的调试输出示例:
2015-02-13 07:40:46,994 [Thread-2] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[3,10],records[-1,-1],filtered_records[4,19] C: R:
TupleSize=1
TupleSize=1
TupleSize=1
答案 1 :(得分:0)
这会为所有行返回false
:
if (tuple == null || tuple.size() > 0) {
return false;
}
这是取userName
而非age
:
Object object = tuple.get(0);