启动Apache Spark集群通常是通过代码库提供的spark-submit shell脚本完成的。但是,问题是每次集群关闭并重新启动时,都需要执行这些shell脚本来启动spark集群。
Supervisord非常适合管理流程,并且似乎是重启后自动启动spark流程的理想选择。
但是,通过
启动主进程后command=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp :/path/spark-1.3.0-bin-cdh4/sbin/../conf:/path/spark-1.3.0-bin-cdh4/lib/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar:etc/hadoop/conf -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip master.mydomain.com --port 7077 --webui-port 18080
和
的工作流程command=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp :/path/spark-1.3.0-bin-cdh4/sbin/../conf:/path/spark-1.3.0-bin-cdh4/lib/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar:etc/hadoop/conf -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://master.mydomain.com:7077
提交spark应用程序后,我最终得到以下错误:
15/06/05 17:16:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 3
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 4
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 5
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 6
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 7
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 8
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 9
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
15/06/05 17:16:32 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: Master removed our application: FAILED
有谁知道如何通过supervisord管理火花过程?
我也对替代解决方案持开放态度。
答案 0 :(得分:5)
可以通过
在前台运行spark mastercommand=/path/spark-1.3.0-bin-cdh4/sbin/../bin/spark-class org.apache.spark.deploy.master.Master --ip master.mydomain.com --port 7077 --webui-port 18080
和工人
command=/path/spark-1.3.0-bin-cdh4/sbin/../bin/spark-class org.apache.spark.deploy.worker.Worker spark://master.mydomain.com:7077