Spark工作程序重试连接到master但接收java.util.concurent.RejectedExecutionException

时间:2016-03-08 03:04:14

标签: java apache-spark

我在12台机器上设置了一个集群,从机上的火花工人可以每天与主机解除关联。这意味着他们可以在一天中看起来工作一段时间,但随后奴隶会所有解除关联,然后被关闭。
工作人员的日志如下所示: 16/03/07 12:45:34.828 INFO Worker: Retrying connection to master (attempt # 1) 16/03/07 12:45:34.830 INFO Worker: Connecting to master host1:7077... 16/03/07 12:45:34.826 INFO Worker: Retrying connection to master (attempt # 2) 16/03/07 12:45:45.830 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-2,5,main] java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@1c5651e9 rejected from java.util.concurrent.ThreadPoolExecutor@671ba687[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 2] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) ... 16/03/07 12:45:45.853 Info ExecutorRunner: Killing process! 16/03/07 12:45:45.855 INFO ShutdownHookManager: Shutdown hook called 主人的日志如下所示:
16/03/07 12:45:45.878 INFO Master:10.126.217.11:51502已取消关联,将其删除。
16/03/07 12:45:45.878 INFO Master:删除工人 - 20160303035822-10.126.217.11-51502,电话:10.126.217.11:51502

机器信息:
每台机器40个核心和256GB内存
火花版:1.5.1
java版本:1.8.0_45
spark集群在此集群上运行,配置如下: spark.cores.max=360 spark.executor.memory=32g

它是从机还是主机上的内存问题?
或者它是奴隶和主机之间的网络问题? 还是其他任何问题?

请告知。
感谢

0 个答案:

没有答案