Spark 1.5.1独立集群 - 线程中的异常" main" akka.actor.ActorNotFound:找不到

时间:2015-10-08 19:24:57

标签: apache-spark

我遇到了通过spark-submit或java代码向集群提交作业的问题。作业仍然失败,stderr日志(在SPARK_HOME / work / app_id ..下)始终显示相同的错误:

15/10/08 23:04:39 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@masternode:53411] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://sparkDriver@masternode:53411]] Caused by: [Connection refused: masternode/192.168.10.214:53411]
Exception in thread "main" akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@masternode:53411/), Path(/user/MapOutputTracker)]
    at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
    at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
    at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
    at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
    at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
    at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266)
    at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533)
    at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569)
    at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559)
    at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
    at akka.remote.EndpointWriter.postStop(Endpoint.scala:557)
    at akka.actor.Actor$class.aroundPostStop(Actor.scala:477)
    at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411)
    at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
    at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
    at akka.actor.ActorCell.terminate(ActorCell.scala:369)
    at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
    at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
    at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

有什么可能导致这种情况的线索?运行netstat表明没有进程侦听端口53411。

1 个答案:

答案 0 :(得分:0)

我猜错误信息:

  

与远程系统的关联   [akka.tcp:// sparkDriver @ masternode:53411]失败

告诉您主工作者沟通有问题。

之前我收到此错误,我的建议是:

  1. 确保您的主地址正确,请查看master url here
  2. 检查防火墙,iptable以确保端口未被阻止。 Spark将使用一些随机端口号进行通信
  3. 确保你的记忆足够大,如果没有足够的资源,它有时也会产生类似的错误
  4. 您可以在端口4040和18080监控您的群集状态,也许它也会为您提供一些有用的线索。

    http://<server-url>:18080
    
    http://<driver-node>:4040