我需要你的帮助才能知道如何在我的Pig udf函数中使用别名(存储的元组),我这样做了:
my_file.csv文件
101,message here
102,message here
103,message here
...
我的剧本PIG:
X = load'mydata.csv' using PigStorage(',') as (myVar:chararray);
A = load'my_file.csv' using PigStorage(',') as (key:chararray,value:chararray);
B = GROUP par ALL;
C = foreach B {
D = ORDER par BY key;
GENERATE BagToTuple(D);
};
the result of the C is something like (101,message here, 102, message here, 103, message here...)
现在我需要的是将这个结果传递给我的udf函数,如:
Z = foreach X generate MYUDF(myVar, C);
别名“C”是元组键,值,键,值......
MYUDF:
import java.io.IOException;
import java.util.regex.Pattern;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.PigWarning;
import org.apache.pig.data.DataType;
import org.apache.pig.impl.util.WrappedIOException;
import org.apache.pig.impl.logicalLayer.schema.Schema;
public class ReDecode extends EvalFunc<String> {
int numParams = -1;
Pattern mPattern = null;
@Override
public Schema outputSchema(Schema input) {
try {
return new Schema(new Schema.FieldSchema(getSchemaName(this
.getClass().getName().toLowerCase(), input),
DataType.CHARARRAY));
} catch (Exception e) {
return null;
}
}
@Override
public String exec(Tuple tuple) throws IOException {
if (numParams==-1) // Not initialized
{
numParams = tuple.size();
if (numParams <= 2) {
String msg = "Decode: Atleast an expression and default string is required.";
throw new IOException(msg);
}
if (tuple.size()%2!=0) {
String msg = "ItssPigUDFs.ReDecode : Some parameters are unmatched.";
throw new IOException(msg);
}
}
if (tuple.get(0)==null)
return null;
try {
for (int count = 1; count < numParams - 1; count += 2)
{
mPattern=Pattern.compile((String)tuple.get(count));
if (mPattern.matcher((String)tuple.get(0)).matches())
{
return (String)tuple.get(count+1);
}
}
} catch (ClassCastException e) {
warn("ItssPigUDFs.ReDecode : Data type error", PigWarning.UDF_WARNING_1);
return null;
} catch (NullPointerException e) {
String msg = "ItssPigUDFs.ReDecode : Encounter null in the input";
throw new IOException(msg);
}
return (String)tuple.get(tuple.size()-1);
}
感谢您的帮助
答案 0 :(得分:0)
我认为不需要numParams
;您到达UDF的参数数量为input.size()
。
因此,如果您致电MYUDF(myVar, C)
,那么您应该能够像String myVar = (String) input.get(0)
和Tuple param2 = input.get(1)
那样使用Java获取这些值。