我有一个特定的过滤问题(这里描述:Pig - How to manipulate and compare dates?),所以我们告诉我,我决定编写自己的过滤UDF。这是代码:
import java.io.IOException;
import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;
import org.joda.time.*;
import org.joda.time.format.*;
public class DateCloseEnough extends FilterFunc {
int nbmois;
/*
* @param nbMois: if the number of months between two dates is inferior to this variable, then we consider that these two dates are close
*/
public DateCloseEnough(String nbmois_) {
nbmois = Integer.valueOf(nbmois_);
}
public Boolean exec(Tuple input) throws IOException {
// We're getting the date
String date1 = (String)input.get(0);
// We convert it into date
final DateTimeFormatter dtf = DateTimeFormat.forPattern("MM yyyy");
LocalDate d1 = new LocalDate();
d1 = LocalDate.parse(date1, dtf);
d1 = d1.withDayOfMonth(1);
// We're getting today's date
DateTime today = new DateTime();
int mois = today.getMonthOfYear();
String real_mois;
if(mois >= 1 && mois <= 9) real_mois = "0" + mois;
else real_mois = "" + mois;
LocalDate d2 = new LocalDate();
d2 = LocalDate.parse(real_mois + " " + today.getYear(), dtf);
d2 = d2.withDayOfMonth(1);
// Number of months between these two dates
String nb_months_between = "" + Months.monthsBetween(d1,d2);
return (Integer.parseInt(nb_months_between) <= nbmois);
}
}
我从Eclipse创建了这段代码的Jar文件。
我正在使用这些品牌的代码过滤我的数据:
REGISTER Desktop/myUDFs.jar
DEFINE DateCloseEnough DateCloseEnough('12');
experiences1 = LOAD '/home/training/Desktop/BDD/experience.txt' USING PigStorage(',') AS (id_cv:int, id_experience:int, date_deb:chararray, date_fin:chararray, duree:int, contenu_experience:chararray);
experiences = FILTER experiences1 BY DateCloseEnough(date_fin);
我正在用这个linux命令启动我的程序:
pig -x local "myScript.pig"
我收到了这个错误:
2013-06-19 07:27:17,253 [main] INFO org.apache.pig.Main - Logging error messages to: /home/training/pig_1371652037252.log
2013-06-19 07:27:17,933 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/joda/time/ReadablePartial Details at logfile: /home/training/pig_1371652037252.log
我检查了日志文件,我看到了这个:
Pig Stack Trace
ERROR 2998: Unhandled internal error. org/joda/time/ReadablePartial
java.lang.NoClassDefFoundError: org/joda/time/ReadablePartial
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:441)
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:471)
at org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigContext.java:544)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncSpec(QueryParser.java:4834)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.PUnaryCond(QueryParser.java:1949)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.PAndCond(QueryParser.java:1790)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.POrCond(QueryParser.java:1734)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.PCond(QueryParser.java:1700)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FilterClause(QueryParser.java:1548)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1276)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:682)
at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:320)
Caused by: java.lang.ClassNotFoundException: org.joda.time.ReadablePartial
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
... 24 more
我试图修改我的PIG_CLASSPATH变量,但我发现这个变量根本不存在(其他一些猪脚本正在工作)。
你有解决问题的想法吗?
感谢。
答案 0 :(得分:1)
首先,你需要告诉猪你正在使用哪个罐子。请参阅此答案:how to include external jar file using PIG。 配置构建路径在eclipse中添加它是不够的。 Eclipse不会帮助您生成正确的jar。
其次,String nb_months_between = "" + Months.monthsBetween(d1,d2);
是错误的。您可以使用int nb_months_between = Months.monthsBetween(d1,d2).getMonths();
。如果您读取Months.toString,则返回"P" + String.valueOf(getValue()) + "M";
。因此,您无法使用此值并希望将其转换为int。
答案 1 :(得分:0)
你需要这个包:org/joda/time/ReadablePartial
可以在这里找到:jarfinder
下载joda-time-1.5.jar
。添加到您的项目,这应该解决。