Apache Pig and User Defined Functions

时间:2015-06-30 13:29:02

标签: python hadoop apache-pig jython user-defined-functions

I am trying to read in a log file using Apache Pig. After reading in the file I want to use my own User Defined Functions in Python. What I'm trying to do is somthing like the following code, but it results in ERROR 1066:Unable to open iterator for alias B, which I have been unable to find a solution for via google.

register 'userdef.py' using jython as parser;
A = LOAD 'test_data' using PigStorage() as (row);
B = FOREACH A GENERATE parser.split(A.row);
DUMP B;

However, if I replace A.row with an empty string '' the function call is completed and no error ocurrs (but the data is not passed nor processed either).

What is the proper way to pass the row of data to the UDF in string format?

1 个答案:

答案 0 :(得分:1)

You do not need to specify A.row, row alone or $0 should work. $0 is the first column, $1 the second one.

Be careful, PigStorage will automatically split your data if it finds any delimiter, so row may be only first element of each row.

Antony.