Question

当我向群集提交spark作业时，它在shell中出现以下错误：

> Exception in thread "main" org.apache.spark.SparkException:
> Application application_1497125798633_0065 finished with failed status
>         at org.apache.spark.deploy.yarn.Client.run(Client.scala:1244)
>         at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1290)
>         at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/06/29 10:25:36 INFO ShutdownHookManager: Shutdown hook called

这是Yarn日志中的内容：

> Caused by: java.io.IOException: Failed to connect to /0.0.0.0:35994 at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
> at
> org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
> at org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:194) at
> org.apache.spark.rpc.netty.Outbox$anon$1.call(Outbox.scala:190) at
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)

我想这意味着它无法连接到驱动程序。我试图增加“spark.yarn.executor.memoryOverhead”参数，但这不起作用。

这是我使用的提交命令：

/bin/spark-submit \
  --class example.Hello \
  --jars ... \
  --master yarn \
  --deploy-mode cluster \
  --supervise \
  --conf spark.yarn.driver.memoryOverhead=1024 ...(jar file path)

我正在使用HDP-2.6.1.0和spark 2.1.1

Answer 1

在纱线模式下运行Spark（我正在做的事情）是在HDP中使用spark的权利，如下所述：https://community.hortonworks.com/questions/52591/standalone-spark-using-ambari.html

这意味着我不应该指定master或使用start-master / start-slave命令。

问题是由于某种原因驱动程序IP被视为0.0.0.0，并且所有群集节点都试图使用本地接口联系驱动程序，因此失败。我通过在conf / spark-defaults.conf中设置以下配置来修复此问题：

spark.driver.port = 20002

spark.driver.host = HOST_NAME

并将部署模式更改为客户端，使其在本地部署驱动程序。

Answer 2

见：

Caused by: java.io.IOException: Failed to connect to /0.0.0.0:35994

尝试spark-submit --master <master-ip>:<spark-port>提交作业。

在提交作业以在纱线模式下火花时未能连接到火花驱动器

2 个答案: