错误org.apache.pig.PigServer - 解析期间的异常:解析期间出错。无法实例化

时间:2014-01-09 13:16:37

标签: java apache-pig user-defined-functions

下面是我正在尝试执行的脚本。我已经注册了我的UDF jar文件hotornot_09_01_14_second.jar。后来我试图直接调用它。请注意,在这种情况下,我没有使用DEFINE语句。不幸的是,如果我对DEFINE尝试相同的事情,我会得到相同的错误,而不是'null'就是说'venues_regex.txt'。

REGISTER '/somepath/piggybank.jar';
REGISTER '/somepath/mysql-connector-java-5.1.18-bin.jar';
REGISTER '/somepath/hotornot_09_01_14_second.jar';

--DEFINE GenerateVenueUDF com.anton.hadoop.pig.production.GenerateVenueUDF('venues_regex.txt');

venues = LOAD 'venues_extended_2.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (Name:chararray, Type:chararray, Latitude:double, Longitude:double, City:chararray, Country:chararray);
tweets = LOAD 'tweets_extended.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (Text:chararray, WeekDay:chararray, Day:int, Time:chararray, SMT:chararray, Year:int, Location:chararray, Language:chararray, Followers_count:int, Friends_count:int);

tweetsReduced = foreach tweets generate Text;

venuesTweets = foreach tweetsReduced generate *, com.anton.hadoop.pig.production.GenerateVenueUDF(Text);

venueCounts = FOREACH (GROUP venuesTweets BY $1) GENERATE group, COUNT($1) as counter;
venueCountsOrdered = order venueCounts by counter;

--DUMP venueCountsOrdered;

STORE venueCountsOrdered INTO 'VenueData' USING org.apache.pig.piggybank.storage.DBStorage(some connection details);

我收到此错误ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. could not instantiate 'com.anton.hadoop.pig.production.GenerateVenueUDF' with arguments 'null'

这是我的UDF:

package com.anton.hadoop.pig.production;

import java.io.IOException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.UDFContext;

public class GenerateVenueUDF extends EvalFunc<String> {
    private String regex;
    private static Pattern p;

    public GenerateVenueUDF() throws IOException {
        String fileName = "venues_regex.txt";
        FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
        Scanner sc = new Scanner(fs.open(new Path(fileName)));
        regex = sc.nextLine(); // should be one line only !!!
        p = Pattern.compile(regex);
        sc.close();
    }

    @Override
    public String exec(Tuple tuple) throws IOException {
        // expect one string
        if (tuple == null) {
            throw new IllegalArgumentException(
                    "BagTupleExampleUDF: requires at least one input parameter.");
        }
        try {
            String tweet = (String) tuple.get(0);
//          TupleFactory tf = TupleFactory.getInstance();
//          BagFactory mBagFactory = BagFactory.getInstance();
//          Tuple t = tf.newTuple();
//          t.append(tweet);
//          t.append(checkVenue(tweet));
//          DataBag output = mBagFactory.newDefaultBag();
//          output.add(t);
            return checkVenue(tweet);
        } catch (Exception e) {
            throw new IOException(
                    "BagTupleExampleUDF: caught exception processing input.", e);
        }
    }

    public static String checkVenue(String tweet) {
        Matcher m = p.matcher(tweet);
        if (m.find()) {
            return m.group(1);
        } else {
            return "";
        }
    }

}

在这种情况下,构造函数没有采用任何参数,但正如我上面提到的,如果我尝试DEFINE UDF并传递fileName作为参数,我仍然会遇到类似的错误。 任何人都可以帮我解决这个错误。任何建议都非常欢迎,谢谢!

1 个答案:

答案 0 :(得分:0)

在UDF实例化期间发生异常时会发生这种情况。可能在构造函数中出现了问题。

我会在构造函数中添加一些日志记录,或者使用PigUnit构建单元测试以找出问题所在。