Spark Worker Nodes已启动但未在WebUI中显示

时间:2017-07-05 03:33:19

标签: apache-spark cassandra

我正在尝试使用一些Raspberry Pi和我的主桌面设置一个小型3节点Spark群集,但似乎无法让Pi与我的主节点(桌面)通话)。我正确配置了网络,因为我还在所有三个节点上运行Cassandra(开源而不是DSE)。如果我转到Web UI,它只显示我的主计算机。我可以为每个工作节点添加web ui地址并获取其各自的web ui页面。他们似乎不知道我的主节点。我的slaves文件中有每个从节点。我觉得我错过了一件小事来让它发挥作用。任何建议将不胜感激。下面是一些日志和我能想到的任何其他可能有用的信息,同时试图保持这个简洁和简洁。

所有节点上的spark-env.sh如下(除了适当调整本地IP)

export SPARK_WORKER_CORES=6 
export SPARK_MASTER_HOST=192.168.0.106

export SPARK_LOCAL_IP=192.168.0.201

从工作节点记录:

Spark Command: /usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt/jre/bin/java -cp /home/spark/spark/conf/:/home/spark/spark/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://Palehorse:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/07/05 03:22:40 INFO Worker: Started daemon with process name: 11065@PiCamp1
17/07/05 03:22:40 INFO SignalUtils: Registered signal handler for TERM
17/07/05 03:22:40 INFO SignalUtils: Registered signal handler for HUP
17/07/05 03:22:40 INFO SignalUtils: Registered signal handler for INT
17/07/05 03:22:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/07/05 03:22:42 INFO SecurityManager: Changing view acls to: spark
17/07/05 03:22:42 INFO SecurityManager: Changing modify acls to: spark
17/07/05 03:22:42 INFO SecurityManager: Changing view acls groups to: 
17/07/05 03:22:42 INFO SecurityManager: Changing modify acls groups to: 
17/07/05 03:22:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(spark); groups with view permissions: Set(); users  with modify permissions: Set(spark); groups with modify permissions: Set()
17/07/05 03:22:43 INFO Utils: Successfully started service 'sparkWorker' on port 35342.
17/07/05 03:22:44 INFO Worker: Starting Spark worker 192.168.0.201:35342 with 6 cores, 1024.0 MB RAM
17/07/05 03:22:44 INFO Worker: Running Spark version 2.1.1
17/07/05 03:22:44 INFO Worker: Spark home: /home/spark/spark
17/07/05 03:22:45 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
17/07/05 03:22:45 INFO WorkerWebUI: Bound WorkerWebUI to 192.168.0.201, and started at http://192.168.0.201:8081
17/07/05 03:22:45 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:22:51 INFO Worker: Retrying connection to master (attempt # 1)
17/07/05 03:22:51 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:22:57 INFO Worker: Retrying connection to master (attempt # 2)
17/07/05 03:22:57 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:03 INFO Worker: Retrying connection to master (attempt # 3)
17/07/05 03:23:03 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:09 INFO Worker: Retrying connection to master (attempt # 4)
17/07/05 03:23:09 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:15 INFO Worker: Retrying connection to master (attempt # 5)
17/07/05 03:23:15 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:21 INFO Worker: Retrying connection to master (attempt # 6)
17/07/05 03:23:21 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:23:57 INFO Worker: Retrying connection to master (attempt # 7)
17/07/05 03:23:57 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:24:33 INFO Worker: Retrying connection to master (attempt # 8)
17/07/05 03:24:33 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:24:45 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
17/07/05 03:24:45 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
17/07/05 03:24:45 WARN Worker: Failed to connect to master Palehorse:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
    at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:218)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:229)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
    ... 4 more
17/07/05 03:25:09 INFO Worker: Retrying connection to master (attempt # 9)
17/07/05 03:25:09 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:25:45 INFO Worker: Retrying connection to master (attempt # 10)
17/07/05 03:25:45 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:26:21 INFO Worker: Retrying connection to master (attempt # 11)
17/07/05 03:26:21 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:26:57 INFO Worker: Retrying connection to master (attempt # 12)
17/07/05 03:26:57 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:27:09 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
17/07/05 03:27:09 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
17/07/05 03:27:09 WARN Worker: Failed to connect to master Palehorse:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
    at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:218)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:229)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
    ... 4 more
17/07/05 03:27:33 INFO Worker: Retrying connection to master (attempt # 13)
17/07/05 03:27:33 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:28:09 INFO Worker: Retrying connection to master (attempt # 14)
17/07/05 03:28:09 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:28:45 INFO Worker: Retrying connection to master (attempt # 15)
17/07/05 03:28:45 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:29:21 INFO Worker: Retrying connection to master (attempt # 16)
17/07/05 03:29:21 INFO Worker: Connecting to master Palehorse:7077...
17/07/05 03:29:33 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
17/07/05 03:29:33 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
17/07/05 03:29:33 WARN Worker: Failed to connect to master Palehorse:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
    at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:218)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connecting to Palehorse/198.105.254.63:7077 timed out (120000 ms)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:229)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
    ... 4 more
17/07/05 03:29:57 ERROR Worker: All masters are unresponsive! Giving up.

1 个答案:

答案 0 :(得分:0)

我终于可以让奴隶与主人交谈了。似乎是一个事物的组合,一个问题是我的/etc/hosts文件的主名称设置为127.0.1.1地址,而另一个问题确实是start-all.sh解决了问题通过运行start-slave.sh spark://<master ip address>:7077

相关问题