下面是我正在尝试执行的脚本。我已经注册了我的UDF jar文件hotornot_09_01_14_second.jar
。后来我试图直接调用它。请注意,在这种情况下,我没有使用DEFINE
语句。不幸的是,如果我对DEFINE
尝试相同的事情,我会得到相同的错误,而不是'null'就是说'venues_regex.txt'。
REGISTER '/somepath/piggybank.jar';
REGISTER '/somepath/mysql-connector-java-5.1.18-bin.jar';
REGISTER '/somepath/hotornot_09_01_14_second.jar';
--DEFINE GenerateVenueUDF com.anton.hadoop.pig.production.GenerateVenueUDF('venues_regex.txt');
venues = LOAD 'venues_extended_2.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (Name:chararray, Type:chararray, Latitude:double, Longitude:double, City:chararray, Country:chararray);
tweets = LOAD 'tweets_extended.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (Text:chararray, WeekDay:chararray, Day:int, Time:chararray, SMT:chararray, Year:int, Location:chararray, Language:chararray, Followers_count:int, Friends_count:int);
tweetsReduced = foreach tweets generate Text;
venuesTweets = foreach tweetsReduced generate *, com.anton.hadoop.pig.production.GenerateVenueUDF(Text);
venueCounts = FOREACH (GROUP venuesTweets BY $1) GENERATE group, COUNT($1) as counter;
venueCountsOrdered = order venueCounts by counter;
--DUMP venueCountsOrdered;
STORE venueCountsOrdered INTO 'VenueData' USING org.apache.pig.piggybank.storage.DBStorage(some connection details);
我收到此错误ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. could not instantiate 'com.anton.hadoop.pig.production.GenerateVenueUDF' with arguments 'null'
这是我的UDF:
package com.anton.hadoop.pig.production;
import java.io.IOException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.UDFContext;
public class GenerateVenueUDF extends EvalFunc<String> {
private String regex;
private static Pattern p;
public GenerateVenueUDF() throws IOException {
String fileName = "venues_regex.txt";
FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
Scanner sc = new Scanner(fs.open(new Path(fileName)));
regex = sc.nextLine(); // should be one line only !!!
p = Pattern.compile(regex);
sc.close();
}
@Override
public String exec(Tuple tuple) throws IOException {
// expect one string
if (tuple == null) {
throw new IllegalArgumentException(
"BagTupleExampleUDF: requires at least one input parameter.");
}
try {
String tweet = (String) tuple.get(0);
// TupleFactory tf = TupleFactory.getInstance();
// BagFactory mBagFactory = BagFactory.getInstance();
// Tuple t = tf.newTuple();
// t.append(tweet);
// t.append(checkVenue(tweet));
// DataBag output = mBagFactory.newDefaultBag();
// output.add(t);
return checkVenue(tweet);
} catch (Exception e) {
throw new IOException(
"BagTupleExampleUDF: caught exception processing input.", e);
}
}
public static String checkVenue(String tweet) {
Matcher m = p.matcher(tweet);
if (m.find()) {
return m.group(1);
} else {
return "";
}
}
}
在这种情况下,构造函数没有采用任何参数,但正如我上面提到的,如果我尝试DEFINE
UDF并传递fileName
作为参数,我仍然会遇到类似的错误。
任何人都可以帮我解决这个错误。任何建议都非常欢迎,谢谢!
答案 0 :(得分:0)
在UDF实例化期间发生异常时会发生这种情况。可能在构造函数中出现了问题。
我会在构造函数中添加一些日志记录,或者使用PigUnit构建单元测试以找出问题所在。