为什么我的火花执行器不断启动和退出?

时间:2018-08-31 21:48:48

标签: python apache-spark pyspark

我正在尝试在独立的Spark集群上运行一个简单的Python脚本。群集具有一个运行prop="distributed-dev.properties, database-${DEPLOY_ENV}.properties, compute-${DEPLOY_ENV}.properties" 的节点和两个运行bin/start-master.sh的节点。查看主节点上的Spark UI,我可以看到主节点可以看到工作线程。这是我的小Python测试脚本:

bin/start-slave.sh

我使用以下命令在单独的节点上运行此命令:

from pyspark import SparkContext def add_three(num: int): return num + 3 print("Initializing spark context....") sc = SparkContext(appName="test.py") arr = [x for x in range(1000)] print(f'Initial array: {arr}') res = (sc.parallelize(arr) .map(lambda x: add_three(x)) .collect()) print(f'Transformed array: {res}') sc.stop()

这使事情开始,我可以在主用户界面中看到该应用程序。在输出中,将打印初始数组,但随后会有连续的工作流退出和启动。这是主日志:

bin/spark-submit --master spark://spark-master:7077 test.py

我知道在我的pyspark脚本中使用2018-08-31 21:23:12 INFO Master:54 - I have been elected leader! New state: ALIVE 2018-08-31 21:23:18 INFO Master:54 - Registering worker 10.1.2.93:38905 with 1 cores, 1024.0 MB RAM 2018-08-31 21:23:20 INFO Master:54 - Registering worker 10.1.1.107:36421 with 1 cores, 1024.0 MB RAM 2018-08-31 21:25:51 INFO Master:54 - Registering app test.py 2018-08-31 21:25:51 INFO Master:54 - Registered app test.py with ID app-20180831212551-0000 2018-08-31 21:25:52 INFO Master:54 - Launching executor app-20180831212551-0000/0 on worker worker-20180831212319-10.1.1.107-36421 2018-08-31 21:25:52 INFO Master:54 - Launching executor app-20180831212551-0000/1 on worker worker-20180831212318-10.1.2.93-38905 2018-08-31 21:25:53 INFO Master:54 - Removing executor app-20180831212551-0000/0 because it is EXITED 2018-08-31 21:25:53 INFO Master:54 - Launching executor app-20180831212551-0000/2 on worker worker-20180831212319-10.1.1.107-36421 2018-08-31 21:25:55 INFO Master:54 - Removing executor app-20180831212551-0000/2 because it is EXITED 2018-08-31 21:25:55 INFO Master:54 - Launching executor app-20180831212551-0000/3 on worker worker-20180831212319-10.1.1.107-36421 2018-08-31 21:25:55 INFO Master:54 - Removing executor app-20180831212551-0000/1 because it is EXITED 2018-08-31 21:25:55 INFO Master:54 - Launching executor app-20180831212551-0000/4 on worker worker-20180831212318-10.1.2.93-38905 2018-08-31 21:25:56 INFO Master:54 - Removing executor app-20180831212551-0000/3 because it is EXITED 2018-08-31 21:25:56 INFO Master:54 - Launching executor app-20180831212551-0000/5 on worker worker-20180831212319-10.1.1.107-36421 时此方法有效。驱动程序日志或执行程序日志似乎都没有任何错误,因此我没有出现问题的线索,但它们都在启动和删除执行程序的过程中不断滚动。

任何见识将不胜感激!谢谢!

1 个答案:

答案 0 :(得分:0)

事实证明这是网络问题。我在单独的Docker容器中运行我的spark工作者,master和驱动程序,需要公开它们之间的端口。特别是spark.driver.portspark.ui.portspark.blockManager.port的端口。我可以通过遵循dockerfile并在此仓库中运行脚本来使工作正常:https://github.com/tashoyan/docker-spark-submit

谢谢!