使用ExitCode时,Spark on Yarn作业失败:1并且stderr说“找不到主类”

时间:2015-08-21 03:18:56

标签: hadoop apache-spark yarn

我们尝试将一个简单的SparkPI示例提交到Spark on Yarn上。 bat的编写如下:

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 1g --executor-cores 1 .\examples\target\spark-examples_2.10-1.4.0.jar 10
pause

我们的HDFS和纱线效果很好。我们正在使用Hadoop 2.7.0和Spark 1.4.1。我们只有一个节点同时充当NameNode和DataNode。

当我们执行它时,它失败并带有日志,如下所示:

2015-08-21 11:07:22,044 DEBUG [main] | ===============================================================================
2015-08-21 11:07:22,044 DEBUG [main] | Yarn AM launch context:
2015-08-21 11:07:22,044 DEBUG [main] |     user class: org.apache.spark.examples.SparkPi
2015-08-21 11:07:22,044 DEBUG [main] |     env:
2015-08-21 11:07:22,044 DEBUG [main] |         CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__hadoop_conf__<CPS>{{PWD}}/__spark__.jar<CPS>%HADOOP_HOME%\etc\hadoop<CPS>%HADOOP_HOME%\share\hadoop\common\*<CPS>%HADOOP_HOME%\share\hadoop\common\lib\*<CPS>%HADOOP_HOME%\share\hadoop\mapreduce\*<CPS>%HADOOP_HOME%\share\hadoop\mapreduce\lib\*<CPS>%HADOOP_HOME%\share\hadoop\hdfs\*<CPS>%HADOOP_HOME%\share\hadoop\hdfs\lib\*<CPS>%HADOOP_HOME%\share\hadoop\yarn\*<CPS>%HADOOP_HOME%\share\hadoop\yarn\lib\*<CPS>%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\*<CPS>%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\lib\*
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_CACHE_FILES_FILE_SIZES -> 165181064,1420218
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1440062075415_0026
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_USER -> msrabi
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_MODE -> true
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1440126441200,1440126441575
2015-08-21 11:07:22,060 DEBUG [main] |         SPARK_YARN_CACHE_FILES -> hdfs://msra-sa-44:9000/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-assembly-1.4.0-hadoop2.7.0.jar#__spark__.jar,hdfs://msra-sa-44:9000/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-examples_2.10-1.4.0.jar#__app__.jar
2015-08-21 11:07:22,060 DEBUG [main] |     resources:
2015-08-21 11:07:22,060 DEBUG [main] |         __app__.jar -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-examples_2.10-1.4.0.jar" } size: 1420218 timestamp: 1440126441575 type: FILE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] |         __spark__.jar -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-assembly-1.4.0-hadoop2.7.0.jar" } size: 165181064 timestamp: 1440126441200 type: FILE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] |         __hadoop_conf__ -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/__hadoop_conf__7908628615251032149.zip" } size: 82888 timestamp: 1440126441794 type: ARCHIVE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] |     command:
2015-08-21 11:07:22,075 DEBUG [main] |         {{JAVA_HOME}}/bin/java -server -Xmx4096m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.app.name=org.apache.spark.examples.SparkPi' '-Dspark.executor.memory=1g' '-Dspark.driver.memory=4g' '-Dspark.master=yarn-cluster' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.examples.SparkPi' --jar file:/D:/sp/./examples/target/spark-examples_2.10-1.4.0.jar --arg '10' --executor-memory 1024m --executor-cores 1 --num-executors  3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
2015-08-21 11:07:22,075 DEBUG [main] | ===============================================================================

...........(omitting some lines)......

2015-08-21 11:07:23,231 INFO [main] | Application report for application_1440062075415_0026 (state: ACCEPTED)
2015-08-21 11:07:23,247 DEBUG [main] | 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1440126442169
     final status: UNDEFINED
     tracking URL: http://msra-sa-44:8088/proxy/application_1440062075415_0026/
     user: msrabi
2015-08-21 11:07:24,263 TRACE [main] | 1: Call -> MSRA-SA-44/10.190.173.181:8032: getApplicationReport {application_id { id: 26 cluster_timestamp: 1440062075415 }}
2015-08-21 11:07:24,263 DEBUG [IPC Parameter Sending Thread #0] | IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi sending #37
2015-08-21 11:07:24,263 DEBUG [IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi] | IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi got value #37
2015-08-21 11:07:24,263 DEBUG [main] | Call: getApplicationReport took 0ms
2015-08-21 11:07:24,263 TRACE [main] | 1: Response <- MSRA-SA-44/10.190.173.181:8032: getApplicationReport {application_report { applicationId { id: 26 cluster_timestamp: 1440062075415 } user: "msrabi" queue: "default" name: "org.apache.spark.examples.SparkPi" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED trackingUrl: "http://msra-sa-44:8088/proxy/application_1440062075415_0026/" diagnostics: "" startTime: 1440126442169 finishTime: 0 final_application_status: APP_UNDEFINED app_resource_Usage { num_used_containers: 1 num_reserved_containers: 0 used_resources { memory: 4608 virtual_cores: 1 } reserved_resources { memory: 0 virtual_cores: 0 } needed_resources { memory: 4608 virtual_cores: 1 } memory_seconds: 0 vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId { application_id { id: 26 cluster_timestamp: 1440062075415 } attemptId: 1 } progress: 0.0 applicationType: "SPARK" }}
2015-08-21 11:07:24,263 INFO [main] | Application report for application_1440062075415_0026 (state: ACCEPTED)

.......(omitting some lines where the state are all ACCEPTED and final status are all UNDEFINED).....

2015-08-21 11:07:30,359 INFO [main] | Application report for application_1440062075415_0026 (state: FAILED)
2015-08-21 11:07:30,359 DEBUG [main] | 
     client token: N/A
     diagnostics: Application application_1440062075415_0026 failed 2 times due to AM Container for appattempt_1440062075415_0026_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://msra-sa-44:8088/cluster/app/application_1440062075415_0026Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1440062075415_0026_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
    at org.apache.hadoop.util.Shell.run(Shell.java:456)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

Shell output:         1 file(s) moved.

然后我们打开了stderr,它说:

Error: Could not find or load main class 'Dspark.app.name=org.apache.spark.examples.SparkPi'

这很奇怪,这应该是传递给java的参数,似乎java将其识别为主类。日志的command部分应该有一个主类参数,但没有。

怎么会发生这种情况?我们该怎么做才能知道它有什么问题?

谢谢!

1 个答案:

答案 0 :(得分:1)

我们解决了这个问题。

根本原因是,在生成java命令行时,我们的Spark使用单引号(&#39; -Dxxxx&#39;)来包装参数。单引号仅适用于Linux。在Windows上,参数要么未包装,要么用双引号括起来(&#34; -Dxxxx&#34;)。解决这个问题的唯一方法是编辑Spark的源代码并重新编译它。

目前看来这是Spark的一个问题。 (https://issues.apache.org/jira/browse/SPARK-5754