Spark Cluster,无法连接到master。 (WARN工作者:无法连接到主人)

时间:2015-12-18 22:46:09

标签: apache-spark

我有一个包含2个节点的火花群,master(172.17.0.229)slave(172.17.0.228)。我已经修改了spark-env.sh,添加了SPARK_MASTER_IP=127.17.0.229和奴隶,添加了172.17.0.228

我正在使用start-master.sh和从节点使用start-slaves.sh启动主节点。

我可以看到webUI的主节点没有worker,但是worker节点的日志是:

Spark Command: /usr/lib/jvm/java-7-oracle/jre/bin/java -cp /usr/local/src/spark-1.5.2-bin-hadoop2.6/sbin/../conf/:/usr/local/src/spark-1.5.2-bin-hadoop$
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/18 14:17:25 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
15/12/18 14:17:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/18 14:17:26 INFO SecurityManager: Changing view acls to: ujjwal
15/12/18 14:17:26 INFO SecurityManager: Changing modify acls to: ujjwal
15/12/18 14:17:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ujjwal); users wit$
15/12/18 14:17:27 INFO Slf4jLogger: Slf4jLogger started
15/12/18 14:17:27 INFO Remoting: Starting remoting
15/12/18 14:17:27 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@172.17.0.228:47599]
15/12/18 14:17:27 INFO Utils: Successfully started service 'sparkWorker' on port 47599.
15/12/18 14:17:27 INFO Worker: Starting Spark worker 172.17.0.228:47599 with 2 cores, 2.7 GB RAM
15/12/18 14:17:27 INFO Worker: Running Spark version 1.5.2
15/12/18 14:17:27 INFO Worker: Spark home: /usr/local/src/spark-1.5.2-bin-hadoop2.6
15/12/18 14:17:27 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
15/12/18 14:17:27 INFO WorkerWebUI: Started WorkerWebUI at http://172.17.0.228:8081
15/12/18 14:17:27 INFO Worker: Connecting to master 127.17.0.229:7077...
15/12/18 14:17:27 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@127.17.0.229:7077] has failed, address is now$
15/12/18 14:17:27 WARN Worker: Failed to connect to master 127.17.0.229:7077
akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster@127.17.0.229:7077/), Path(/user/Master)]
        at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
        at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
        at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
        at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
        at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
        at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
        at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266)
        at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533)
        at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569)
        at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559)
        at akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
        at akka.remote.EndpointWriter.postStop(Endpoint.scala:557)
        at akka.actor.Actor$class.aroundPostStop(Actor.scala:477)
        at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411)
        at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
        at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
        at akka.actor.ActorCell.terminate(ActorCell.scala:369)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)

感谢您的建议。

2 个答案:

答案 0 :(得分:6)

通常,在172.17.0.229端口8080的Web UI上检查您的工作者尝试连接的IP与报告的SPARK_MASTER_IP=127.17.0.229 地址将有助于确定该地址是否正确。

在这种特殊情况下,看起来你有一个错字;变化

SPARK_MASTER_IP=172.17.0.229

阅读:

 @Override
    public void onClick(View view) {
        AlertDialog.Builder builder = new AlertDialog.Builder(view.getContext());
          //do your stuff here.
}

(你似乎倒了127/172)。

答案 1 :(得分:1)

我的问题是我使用的spark java库(2.0.0)和spark集群版本(2.2.1)之间的版本不匹配