Question

我已经在ambari-server中提交了我的火花工作使用以下命令..

  ./spark-submit --class  customer.core.classname --master yarn --numexecutors 2 --driver-memory 2g --executor-memory 2g --executor-cores 1 /home/hdfs/Test/classname-0.0.1-SNAPSHOT-SNAPSHOT.jar newdata host:6667

它工作正常......

但是，如果我们关闭命令提示符或尝试终止作业，它将如何继续运行，它必须继续运行。

感谢任何帮助。

Answer 1

你可以通过几种方式实现这一目标

1）您可以使用nohup在后台运行spark提交驱动程序进程例如：

nohup  ./spark-submit --class  customer.core.classname \
  --master yarn --numexecutors 2 \
  --driver-memory 2g --executor-memory 2g --executor-cores 1 \
  /home/hdfs/Test/classname-0.0.1-SNAPSHOT-SNAPSHOT.jar \
  newdata host:6667 &

2）以部署模式作为集群运行，以便驱动程序进程在不同的节点中运行。

Answer 2

我认为这个问题更多的是关于shell而不是spark，

为了使应用程序保持运行，即使在关闭shell时，tou也应该在命令末尾添加&。所以你的spark-submit命令就是（只需将&加到最后）

./spark-submit --class  customer.core.classname --master yarn --numexecutors 2 --driver-memory 2g --executor-memory 2g --executor-cores 1 /home/hdfs/Test/classname-0.0.1-SNAPSHOT-SNAPSHOT.jar newdata host:6667 &
[1] 28299

除非重定向，否则仍然会收到日志和输出消息

Answer 3

希望我理解这个问题。通常，如果您希望进程继续运行，则可以创建将在后台运行的进程文件。在你的情况下，作业将继续运行，直到你使用yarn -kill专门杀死它。所以即使你杀了火花提交它也会继续运行，因为纱线在提交后管理它。

Answer 4

警告：我没有对此进行测试。但是，执行您描述的更好的方法可能是使用以下设置：

--deploy-mode cluster \
--conf spark.yarn.submit.waitAppCompletion=false

在这里找到： How to exit spark-submit after the submission

Spark工作继续运行

4 个答案: