我正在尝试安装独立的spark群集。我准备了3个虚拟机,每个虚拟机都安装了Ubuntu。
这三台机器由一个主机和两个从机组成。
我按照Apache spark文档中的步骤进行操作。我从主节点启动了主脚本,它工作正常。
问题发生在奴隶身上。
我尝试使用每台计算机上的sbin/start-slave.sh
启动从站,然后使用主节点中的sbin/start-slaves.sh
再次启动。
从属节点上的worker无法启动并抛出以下异常
15/11/06 02:12:36 WARN Worker: Failed to connect to master rethink-node01:7077
akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster@rethink-node01:7077/), Path(/user/Master)]
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266)
at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533)
at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569)
at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559)
at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
at akka.remote.EndpointWriter.postStop(Endpoint.scala:557)
at akka.actor.Actor$class.aroundPostStop(Actor.scala:477)
at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411)
at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
at akka.actor.ActorCell.terminate(ActorCell.scala:369)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
答案 0 :(得分:4)
我在设置独立的火花星团时遇到了这个问题。对我来说,这是由于纯网络设置问题引起的。
从奴隶机器尝试做
telnet MASTER_IP 7077
如果未建立连接,则需要检查防火墙设置或在我的情况下检查火花端口7077正在侦听的位置。虽然它不起作用,但这是我的netstat的输出。
netstat -an | grep 7077
tcp 0 0 127.0.0.2:7077 ::: * LISTEN
我从/ etc / hosts中删除了有问题的127.0.0.2条目,然后一切正常。
netstat -an | grep 7077
tcp 0 0 1.1.1.1:7077 ::: * LISTEN