如何传递给UDF整个关系?

时间:2014-04-24 08:48:17

标签: hadoop apache-pig

考虑这个脚本:

register udf-1.0.0-BETA.jar

A = LOAD '1.txt' USING PigStorage('\t') as (key1:chararray, val1:chararray);
B = LOAD '2.txt' USING PigStorage('\t') as (key2:chararray, val2:chararray);
joined = JOIN A by key1, B by key2;
out = FOREACH joined GENERATE com.example.UDF();
dump out;

就像这样,我的UDF只获取密钥。如果我试试这个:

out = FOREACH joined GENERATE com.example.UDF(joined);

我遇到了异常需要从关系中投射一个列,以便将其用作标量

我可以像这样传递整个关系

out = FOREACH joined GENERATE com.example.UDF(A::key1, A::val1, B::key2, B::val2);

但这是冗长的。有更简单的方法吗?

1 个答案:

答案 0 :(得分:3)

是的,请尝试以下方法:

out = FOREACH joined GENERATE com.example.UDF(*);