SPARK ERROR:executor.CoarseGrainedExecutorBackend:在EC2集群上执行KMeans集群onspark时的驱动程序

时间:2015-06-17 23:48:54

标签: amazon-ec2 apache-spark pyspark rdd apache-spark-mllib

我正在尝试将作业(在python中进行Kmeans聚类)提交到EC2上的spark独立群集。它有18个节点。我使用的是最新版本的spark(1.4.0)。

我使用以下方式从主人提交作业:

  

SPARK_WORKER_INSTANCES = 30 SPARK_WORKER_CORES = 4 SPARK_WORKER_MEMORY = 30g   SPARK_MEM = 30g OUR_JAVA_MEM =“30g”   SPARK_DAEMON_JAVA_OPTS =“ - XX:MaxPermSize = 30g - Xms30g -Xmx30g”   ./spark/bin/spark-submit app.py --master   火花://ec2-54-174-186-17.compute-1.amazonaws.com:7077   --executor-memory 500G --total-executor-cores 144

我在工人中遇到以下错误:

15/06/17 21:10:01 INFO executor.Executor: Finished task 132.0 in stage 23.0 (TID 3444). 5802749 bytes result sent to driver
15/06/17 21:10:06 ERROR executor.CoarseGrainedExecutorBackend: Driver 172.31.23.236:41498 disassociated! Shutting down.
15/06/17 21:10:06 INFO storage.DiskBlockManager: Shutdown hook called
15/06/17 21:10:06 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@172.31.23.236:41498] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/06/17 21:10:06 INFO util.Utils: Shutdown hook called

同样在大师中我看到以下内容:

> URL: spark://ec2-54-174-186-17.compute-1.amazonaws.com:7077 REST URL:
> spark://ec2-54-174-186-17.compute-1.amazonaws.com:6066 (cluster mode)
> Workers: 18 Cores: 144 Total, 144 Used Memory: 507.7 GB Total, 471.7
> GB Used Applications: 1 Running, 8 Completed Drivers: 0 Running, 0
> Completed Status: ALIVE

环顾一下,我读到当执行程序无法与驱动程序通信时发生 CoarseGrainedExecutorBackend 。我可以在http://ec2-54-174-186-17.compute-1.amazonaws.com:4040访问spark ui。但我不确定驱动程序是否正在运行。 请让我知道我做错了什么。非常感谢。

1 个答案:

答案 0 :(得分:0)

在所有节点上禁用防火墙,然后重试。