我在一个网络中有两台机器。在主机中我正在运行./sbin/spark-master.sh
而在另一个中(让我们称之为从机)我正在运行./bin/spark-shell --master spark://master-machine:7077
。但是我在slave-machine运行spark-shell脚本时遇到了错误。
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/09/07 18:56:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/07 18:56:38 WARN Utils: Your hostname, <HOSTNAME> resolves to a loopback address: 127.0.0.1; using 192.168.0.68 instead (on interface wlan0)
16/09/07 18:56:38 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/09/07 18:56:39 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master <master-machine's IP>:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:109)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to /192.168.43.27:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
... 4 more
Caused by: java.net.ConnectException: Connection refused: /<master-machine's IP>:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
我也尝试了它的IP,但它不起作用。
在主机netstat -nltu
中显示以下内容
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp6 0 0 :::8080 :::* LISTEN
tcp6 0 0 127.0.1.1:6066 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 127.0.1.1:7077 :::* LISTEN
udp 0 0 0.0.0.0:27535 0.0.0.0:*
udp 0 0 0.0.0.0:68 0.0.0.0:*
udp6 0 0 :::54678 :::*
端口7077仅接受来自localhost的数据包而不接受来自网络中其他计算机的数据包。
我在线尝试了大部分解决方案,但都没有奏效。直到我将SPARK_MASTER_HOST设置为主机的IP然后netstat -nltu
显示
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp6 0 0 :::8080 :::* LISTEN
tcp6 0 0 <master-machine IP>:6066 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 <master-machine IP>:7077 :::* LISTEN
udp 0 0 0.0.0.0:27535 0.0.0.0:*
udp 0 0 0.0.0.0:68 0.0.0.0:*
udp6 0 0 :::54678 :::*
问题现在已经解决,但解决这个问题的最佳做法是什么?对我来说这很奇怪的原因是合理的是你希望端口7077不仅接受来自localhost的数据包而且接受其他机器的数据包。
PS:spark版本是spark-2.0.0-bin-hadoop2.7