Question

这是UDF代码

package myudf;
import java.io.IOException; 
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.pig.EvalFunc; 
import org.apache.pig.data.Tuple; 

public class DateFormat extends EvalFunc<String> {
    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0) {
            return null;
        }

        try {
            String dateStr = (String)input.get(0);
            SimpleDateFormat readFormat = new SimpleDateFormat( "MM/dd/yyyy hh:mm:ss.SSS aa");
            SimpleDateFormat writeFormat = new SimpleDateFormat( "yyyy-MM-dd HH:mm:ss.SSS");
            Date date = null;
            try {
                date = readFormat.parse(dateStr);
            } catch (ParseException e) {
                e.printStackTrace();
            }

            return writeFormat.format(date).toString();
        } catch(Exception e) {
            throw new IOException("Caught exception processing input row ", e);
        }
    }
}

导出了一个Jar并注册了grunt

    Register /local/path/to/UDFDate.jar;
    A = LOAD 'hdfs date file';
    B = FOREACH A GENERATE UDFDate.myudf.DateFormat($0);

给出错误

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070：无法使用导入解析UDFDate.DateFormat：[，java.lang。， org.apache.pig.builtin。，org.apache.pig.impl.builtin。]

Answer 1

你不需要指定jar名称（UDFDate.myudf.DateFormat）来调用jar中的函数。它应该是“packageName.className”（myudf.DateFormat）。

如果DateFormat位于myudf包中，那么您应该按以下方式运行：

B = FOREACH A GENERATE myudf.DateFormat($0);

如果DateFormat位于default包中，那么您应该按以下方式运行：

B = FOREACH A GENERATE DateFormat($0);

Answer 2

将你的udf称为：

packagename.classname($0);

Answer 3

已经给出了答案，但为了在每次可以简化它时基本上不重新定义UDF调用：

Register /local/path/to/UDFDate.jar;
DEFINE myDateFormat myudf.DateFormat();
A = LOAD 'hdfs date file';
B = FOREACH A GENERATE myDateFormat($0);

Pig Java UDF问题

3 个答案: