Job.setJar似乎不起作用

时间:2014-06-17 02:59:09

标签: java hadoop yarn

我正在尝试解决Hadoop应用程序抛出java.lang.ClassNotFoundException时的问题:

WARN mapreduce.FaunusCompiler: Using the distribution Faunus job jar: ../lib/faunus-0.4.4-hadoop2-job.jar
INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)
INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: VerticesMap.Map > CountMapReduce.Map > CountMapReduce.Reduce
INFO mapreduce.FaunusCompiler: Job data location: output/job-0
INFO client.RMProxy: Connecting to ResourceManager at yuriys-bigdata3/172.31.8.161:8032
WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner
INFO input.FileInputFormat: Total input paths to process : 1
INFO mapreduce.JobSubmitter: number of splits:1
INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1402963354379_0016
INFO impl.YarnClientImpl: Submitted application application_1402963354379_0016
INFO mapreduce.Job: The url to track the job: http://local-bigdata3:8088/proxy/application_1402963354379_0016/
INFO mapreduce.Job: Running job: job_1402963354379_0016
INFO mapreduce.Job: Job job_1402963354379_0016 running in uber mode : false
INFO mapreduce.Job:  map 0% reduce 0%
INFO mapreduce.Job: Task Id : attempt_1402963354379_0016_m_000000_0, Status : FAILED     

 Error: java.lang.ClassNotFoundException:
 com.tinkerpop.blueprints.util.DefaultVertexQuery
         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
         at java.lang.ClassLoader.defineClass1(Native Method)
         at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
         at com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat.setConf(GraphSONInputFormat.java:39)
         at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
         at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:415)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

该应用确实创造了一个"胖" jar文件,其中所有依赖项jar(包括包含未找到的类的jar)都包含在lib节点下 该应用程序确实在这个胖jar文件上设置了Job.setJar。

代码没有做任何奇怪的事情:

        job.setJar(hadoopFileJar);
         ...
        boolean success = job.waitForCompletion(true);

此外,我在yarn-site.xml中查找配置并验证yarn.nodemanager.local-dirs下的作业目录是否包含该jar(虽然它已重命名为job.jar)以及该lib目录用提取的罐子里面。即包含缺少的类的jar就在那里。 Yarn / MR在每个作业调度后使用所有这些必需文件重新创建此目录,因此文件会在那里传输。

到目前为止,我发现,执行失败代码的java worker进程上的classpath环境变量被设置为 C:\ HDP \数据\ hadoop的\本地\ usercache \用户\应用程序缓存\ application_1402963354379_0013 \ container_1402963354379_0013_02_000001 \类路径-3824944728798396318.jar

并且这个jar只包含一个manifest.mf该清单包含带有" fat.jar"的目录的路径。文件及其目录(保存原始格式):

  

文件:/ C:/ HDP /数据/ hadoop的/ LOC   人/ usercache /用户/应用程序缓存/ application_1402963354379_0013 /容器   _1402963354379_0013_02_000001 / job.jar / job.jar文件:/ c:/ hdp / data / hadoo p / local / usercache / user / appcache / application_1402963354379_0013 / cont   ainer_1402963354379_0013_02_000001 / job.jar / classes / file:/ c:/ hdp / data   / Hadoop的/本地/ usercache /用户/应用程序缓存/ application_1402963354379_001   3 / container_1402963354379_0013_02_000001 / jobSubmitDir / job.splitmetain   fo文件:/ c:/ hdp / data / hadoop / local / usercache / user / appcache / applicati   on_1402963354379_0013 / container_1402963354379_0013_02_000001 / jobSubmi   tDir / job.split文件:/ c:/ hdp / data / hadoop / local / usercache / user / appcac   他/ application_1402963354379_0013 / container_1402963354379_0013_02_000   001 / job.xml文件:/ c:/ hdp / data / hadoop / local / usercache / user / appcache /   application_1402963354379_0013 / container_1402963354379_0013_02_000001   /job.jar /

但是,此路径未在目录中显式添加jar,即清单文件中的目录:/ c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar /确实包含jar文件,其中包含yarn未找到的类(因为此目录包含来自" fat" jar lib部分的所有jar),但对于JAVA世界,这种类路径的设置似乎不正确 - 这个目录应该包含在star *中,例如 文件:/ C:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar / *

将依赖项传递给Yarn我做错了什么? 集群配置是一个问题还是可能这是我的Hadoop发行版(HDP 2.1,Windows x64)上的错误?

0 个答案:

没有答案