我遇到了Spark问题。我有一个带有2个节点的Spark独立集群,
master: 121.*.*.22(hostname is iZ28i1niuigZ)
worker: 123.*.*.125(hostname is VM-120-50-ubuntu)
。我已编辑slaves
个文件并添加了123.*.*.125
。
WebUI上没有工作人员信息:
Nested comments in PHP & MySQL
执行启动脚本时,我看到以下消息:
spark@iZ28i1niuigZ:~/spark-2.0.1-bin-hadoop2.7$ sh sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/spark/spark-2.0.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.master.Master-1-iZ28i1niuigZ.out
123.*.*.125: starting org.apache.spark.deploy.worker.Worker, logging to /home/spark/spark-2.0.1-bin-hadoop2.7/logs/spark-spark-org.apache.spark.deploy.worker.Worker-1-VM-120-50-ubuntu.out
spark-env.sh
文件内容为:
export SPARK_MASTER_IP=121.*.*.22
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORDER_INSTANCES=1
export SPARK_WORKER_MEMORY=1g
export JAVA_HOME=/home/spark/jdk1.8.0_101
在工人身上,我可以看到以下输出:
Spark Command: /home/spark/jdk1.8.0_101/bin/java -cp /home/spark/spark-2.0.1-bin-hadoop2.7/conf/:/home/spark/spark-2.0.1-bin-hadoop2.7/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://iZ28i1niuigZ:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/11/30 20:04:56 INFO Worker: Started daemon with process name: 28287@VM-120-50-ubuntu
16/11/30 20:04:56 INFO SignalUtils: Registered signal handler for TERM
16/11/30 20:04:56 INFO SignalUtils: Registered signal handler for HUP
16/11/30 20:04:56 INFO SignalUtils: Registered signal handler for INT
16/11/30 20:04:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/30 20:04:56 INFO SecurityManager: Changing view acls to: spark
16/11/30 20:04:56 INFO SecurityManager: Changing modify acls to: spark
16/11/30 20:04:56 INFO SecurityManager: Changing view acls groups to:
16/11/30 20:04:56 INFO SecurityManager: Changing modify acls groups to:
16/11/30 20:04:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
16/11/30 20:04:57 INFO Utils: Successfully started service 'sparkWorker' on port 41544.
16/11/30 20:04:57 INFO Worker: Starting Spark worker 10.141.120.50:41544 with 1 cores, 1024.0 MB RAM
16/11/30 20:04:57 INFO Worker: Running Spark version 2.0.1
16/11/30 20:04:57 INFO Worker: Spark home: /home/spark/spark-2.0.1-bin-hadoop2.7
16/11/30 20:04:57 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
16/11/30 20:04:57 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://10.141.120.50:8081
16/11/30 20:04:57 INFO Worker: Connecting to master iZ28i1niuigZ:7077...
16/11/30 20:04:58 WARN Worker: Failed to connect to master iZ28i1niuigZ:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.jav a:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to iZ28i1niuigZ/121.*.*.22:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
... 4 more
Caused by: java.net.ConnectException: Connection refused: iZ28i1niuigZ/121.*.*.22:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/11/30 20:05:08 INFO Worker: Retrying connection to master (attempt # 1)
16/11/30 20:05:08 INFO Worker: Connecting to master iZ28i1niuigZ:7077...
16/11/30 20:05:08 WARN Worker: Failed to connect to master iZ28i1niuigZ:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout .scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:96)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to iZ28i1niuigZ/121.*.*.22:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
... 4 more
Caused by: java.net.ConnectException: Connection refused: iZ28i1niuigZ/121.*.*.22:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
主节点上的/etc/hosts
如下所示:
127.0.0.1 localhost
127.0.1.1 localhost.localdomain localhost
# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
10.251.33.226 iZ28i1niuigZ
123.*.*.125 VM-120-50-ubuntu
/etc/hosts
工作节点包含以下配置:
10.141.120.50 VM-120-50-ubuntu
127.0.0.1 localhost localhost.localdomain
121.*.*.22 iZ28i1niuigZ
我无法理解为什么工人无法连接到主人?
=============================================== ========================= 更新:
我不能telnet 123.*.*.125 7077
,而我可以telnet 123.*.*.125
执行命令iptables -L -n
时,我看到以下消息:
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination