Spark独立模式:工人无法正常停止

时间:2013-06-18 14:50:53

标签: scala mapreduce apache-spark

使用

在spark(0.7.0)中停止整个群集时
$SPARK_HOME/bin/stop-all.sh

并非所有工人都被正确停止。 更具体地说,如果我想用

重新启动集群
$SPARK_HOME/bin/start-all.sh

我明白了:

host1: starting spark.deploy.worker.Worker, logging to [...]
host3: starting spark.deploy.worker.Worker, logging to [...]
host2: starting spark.deploy.worker.Worker, logging to [...]
host5: starting spark.deploy.worker.Worker, logging to [...]
host4: spark.deploy.worker.Worker running as process 8104. Stop it first.
host7: spark.deploy.worker.Worker running as process 32452. Stop it first.
host6: starting spark.deploy.worker.Worker, logging to [...]

在host4和host7上,确实有一个StandaloneExecutorBackend仍在运行:

$ jps
27703 Worker
27763 StandaloneExecutorBackend
28601 Jps

只需重复

$SPARK_HOME/bin/stop-all.sh
不幸的是,也没有阻止工人。 Spark告诉我工人即将停止:

host2: no spark.deploy.worker.Worker to stop
host7: stopping spark.deploy.worker.Worker
host1: no spark.deploy.worker.Worker to stop
host4: stopping spark.deploy.worker.Worker
host6: no spark.deploy.worker.Worker to stop
host5: no spark.deploy.worker.Worker to stop
host3: no spark.deploy.worker.Worker to stop
no spark.deploy.master.Master to stop

然而,

$ jps
27703 Worker
27763 StandaloneExecutorBackend
28601 Jps

另有说法。 有人知道stop-all.sh如何正常工作? 感谢。

1 个答案:

答案 0 :(得分:2)

原因似乎是缓存整个数据集的尝试导致Worker机器以大的方式交换。在这种情况下,工作机器的数量对于数据集来说太小了。