我正在使用oozie来安排Hadoop作业的项目。但最近,oozie不时抛出java.lang.ClassNotFoundException。我检查了错误日志,非常确定将所有需要的jar文件放在目录lib下的hdfs中。以下是hadoop任务日志,最后10行显示了我需要的jar文件。但是当我检查节点上的distcache direcotry时,它是空的。它不会一直发生,只是在此工作流程的最后一次运行后几个小时。所以我怀疑hadoop清理了distcache,并且下次没有将jar文件复制到distcache direcotry。但oozie将在classpath中包含相同的direcotry,这是空的。有人遇到同样的问题吗?我无法为此考虑更好的解决方案。
我使用oozie 3.2.0-与hadoop 1.1.1孵化
Classpath :
------------------------
/home/workspace/hadoop/libexec/../conf
/usr/java/default/lib/tools.jar
/* some jars from hadoop */
/home/data7/mapred_tmp/taskTracker/distcache/-6071601324996771729_2013238955_873176406/localhost/user/supertool/oozie-supe/0000232-140509184943733-oozie-supe-W/begin--java/java-launcher.jar
/home/data9/mapred_tmp/taskTracker/distcache/-4677386048903657010_1227144840_1337300706/localhost/user/supertool/plannex/app/schedule/lib/mysql-connector-java-5.1.29-bin.jar
/home/data10/mapred_tmp/taskTracker/distcache/-8328135876058302714_-1519042818_64290738/localhost/user/supertool/plannex/app/schedule/lib/plannex-schedule-2.0.0-SNAPSHOT-jar-with-dependencies.jar
/home/data11/mapred_tmp/taskTracker/distcache/-3456058783425455308_886532069_1155570996/localhost/user/supertool/plannex/app/schedule/lib/postgresql-9.1-903.jdbc3.jar
/home/data12/mapred_tmp/taskTracker/distcache/7890488265085818377_2040166227_64563179/localhost/user/supertool/plannex/app/schedule/lib/sqoop-1.4.4.jar
/home/data9/mapred_tmp/taskTracker/distcache/-4677386048903657010_1227144840_1337300706/localhost/user/supertool/plannex/app/schedule/lib/mysql-connector-java-5.1.29-bin.jar
/home/data10/mapred_tmp/taskTracker/distcache/-8328135876058302714_-1519042818_64290738/localhost/user/supertool/plannex/app/schedule/lib/plannex-schedule-2.0.0-SNAPSHOT-jar-with-dependencies.jar
/home/data11/mapred_tmp/taskTracker/distcache/-3456058783425455308_886532069_1155570996/localhost/user/supertool/plannex/app/schedule/lib/postgresql-9.1-903.jdbc3.jar
/home/data12/mapred_tmp/taskTracker/distcache/7890488265085818377_2040166227_64563179/localhost/user/supertool/plannex/app/schedule/lib/sqoop-1.4.4.jar
/home/data3/mapred_tmp/taskTracker/supertool/jobcache/job_201405231920_0043/attempt_201405231920_0043_m_000000_0/work
答案 0 :(得分:1)
如果它的地图减少作业,则使用“-libjars”选项将文件每次复制到分布式缓存。您也可以指向hdfs位置。
答案 1 :(得分:1)
Oozie将只传递您提供的参数,执行类似于命令shell。 请尝试以下,您可以在其中传递依赖关系jar的逗号分隔的hdfs位置。确保您已在底层mapreduce代码中实现了GenericOptionParser / Tool。
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>foo.main</main-class>
<arg>-libjars</arg>
<arg>hdfs://namenode/abc.jar,hdfs://namenode/xyz.jar</arg>
<arg>args1</arg>
<arg>args2</arg>
</java>