Question

我有一台2台计算机的集群，我正在尝试使用YARN集群管理器提交spark作业。

vanilla Spark 1.6.2 build aginst hadoop 2.6.2
vanilla Hadoop 2.7.2

我可以使用独立的集群管理器成功运行map-reduce作业并激活作业。但是当我用YARN运行时，我收到了一个错误。

有关如何使其发挥作用的任何建议吗？
如何启用更详细的日志记录？错误信息绝对不清楚
为什么在hadoop / logs / userlogs / applicationXXX下没有创建日志文件？
修辞问题：IMO：hadoop logging＆amp;诊断并不是很好。这是为什么？ Hadoop似乎是一个成熟的产品。

以下是输出：

mike@mp-desktop ~/opt/hadoop $ spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster  ~/prg/scala/spark-examples_2.11-1.0.jar     10
16/07/09 08:59:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/09 08:59:01 INFO client.RMProxy: Connecting to ResourceManager at mp-desktop/192.168.1.60:8050
16/07/09 08:59:01 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
16/07/09 08:59:01 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/07/09 08:59:01 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/07/09 08:59:01 INFO yarn.Client: Setting up container launch context for our AM
16/07/09 08:59:01 INFO yarn.Client: Setting up the launch environment for our AM container
16/07/09 08:59:01 INFO yarn.Client: Preparing resources for our AM container
16/07/09 08:59:02 INFO yarn.Client: Uploading resource file:/home/mike/opt/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/spark-assembly-1.6.2-hadoop2.6.0.jar
16/07/09 08:59:06 INFO yarn.Client: Uploading resource file:/home/mike/prg/scala/spark-examples_2.11-1.0.jar -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/spark-examples_2.11-1.0.jar
16/07/09 08:59:06 INFO yarn.Client: Uploading resource file:/tmp/spark-2ee6dfd6-e9d3-4ca4-9e98-5ce9e75dc757/__spark_conf__7114661171911035574.zip -> hdfs://mp-desktop:9000/user/mike/.sparkStaging/application_1468043888852_0001/__spark_conf__7114661171911035574.zip
16/07/09 08:59:06 INFO spark.SecurityManager: Changing view acls to: mike
16/07/09 08:59:06 INFO spark.SecurityManager: Changing modify acls to: mike
16/07/09 08:59:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mike); users with modify permissions: Set(mike)
16/07/09 08:59:07 INFO yarn.Client: Submitting application 1 to ResourceManager
16/07/09 08:59:07 INFO impl.YarnClientImpl: Submitted application application_1468043888852_0001
16/07/09 08:59:08 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:08 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1468043947113
     final status: UNDEFINED
     tracking URL: http://mp-desktop:8088/proxy/application_1468043888852_0001/
     user: mike
16/07/09 08:59:09 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:10 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:11 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:12 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:13 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:14 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:15 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:16 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:17 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:18 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:19 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:20 INFO yarn.Client: Application report for application_1468043888852_0001 (state: ACCEPTED)
16/07/09 08:59:21 INFO yarn.Client: Application report for application_1468043888852_0001 (state: FAILED)
16/07/09 08:59:21 INFO yarn.Client: 
     client token: N/A
     diagnostics: Application application_1468043888852_0001 failed 2 times due to AM Container for appattempt_1468043888852_0001_000002 exited with  exitCode: -1
For more detailed output, check application tracking page:http://mp-desktop:8088/cluster/app/application_1468043888852_0001Then, click on links to logs of each attempt.
Diagnostics: File /home/mike/hadoopstorage/nm-local-dir/usercache/mike/appcache/application_1468043888852_0001/container_1468043888852_0001_02_000001 does not exist
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1468043947113
     final status: FAILED
     tracking URL: http://mp-desktop:8088/cluster/app/application_1468043888852_0001
     user: mike
16/07/09 08:59:21 INFO yarn.Client: Deleting staging directory .sparkStaging/application_1468043888852_0001
Exception in thread "main" org.apache.spark.SparkException: Application application_1468043888852_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/07/09 08:59:21 INFO util.ShutdownHookManager: Shutdown hook called
16/07/09 08:59:21 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2ee6dfd6-e9d3-4ca4-9e98-5ce9e75dc757

谢谢！

Answer 1

我的错误信息类似：

16/07/15 13:55:53 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:54 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:55 INFO Client: Application report for application_1468583505911_0002 (state: ACCEPTED)
16/07/15 13:55:56 INFO Client: Application report for application_1468583505911_0002 (state: FAILED)
16/07/15 13:55:56 INFO Client:
         client token: N/A
         diagnostics: Application application_1468583505911_0002 failed 2 times due to AM Container for appattempt_1468583505911_0002_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://<redacted>:8088/cluster/app/application_1468583505911_0002Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://<redacted>:8020/user/root/.sparkStaging/application_1468583505911_0002/__spark_conf__4995486282135454270.zip
java.io.FileNotFoundException: File does not exist: hdfs://<redacted>:8020/user/root/.sparkStaging/application_1468583505911_0002/__spark_conf__4995486282135454270.zip
        at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1367)
        at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1359)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1359)
        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
        at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)

尝试在客户端模式下运行YARN而不是群集模式，它会将驱动程序日志输出到shell：

spark-submit --class myClass - master yarn /path/to/myClass.jar

日志输出显示myClass立即失败，因为我的args数量不正确（类预期超过1 arg）。我的自定义退出代码（42）和打印＆＃34;用法＆＃34;信息到日志，让我解决实际问题。

当我使用--master yarn-cluster运行时，这个输出对我来说是不可见的，我看不到＆＃34;用法＆＃34;上述信息。相反，我所拥有的只是模糊的＆＃34;文件不存在＆＃34;上面显示的问题。

为myClass指定正确数量的参数解决了这个问题。

此时我假设我的Spark作业失败得如此之快，以至于它开始清理它在YARN检查之前复制的.sparkStaging文件。

Answer 2

可能你已经解决了你的问题，但我今天早上在纱线集群中使用Spark 2.1遇到了同样的问题，我找到了这篇文章。我遇到了和你一样的错误，我的问题是需要的火花会议对象：

conf = (SparkConf()
         .setMaster("yarn") #I had this value as local
         .setAppName("My app Name")

所以当我改变这个并且我的火花提交（--master yarn --deploy-mode cluster）时，一切正常。

Answer 3

我通过以下语句解决了这个问题，并使用CDH5.14.1和spark 1.6

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue thequeue \
    lib/spark-examples*.jar \
    10

这里是链接：https://spark.apache.org/docs/1.6.0/running-on-yarn.html

Spark 1.6.2＆amp; YARN：诊断：应用程序失败2次，因为AM容器已退出exitCode：-1

3 个答案: