我有一个火花簇(7 * 2核心),它在spark 2.0.2上设置,位于hdfs群集旁边。
当我使用Jupyter读取一些hdfs文件时,我看到应用程序启动,使用14个核心和3个,但是所有工作人员都无法启动任何任务,因为网络无法连接到一个奇怪的" localhost& #34; 35529号港口。
spark = SparkSession.builder.master(master).appName(appName).config("spark.executor.instances", 3).getOrCreate()
sc = spark.sparkContext
hdfs_master = "hdfs://xx.xx.xx.xx:8020"
hdfs_path = "/logs/cycliste_debug/2017/2017_02/2017_02_20/23h/*"
infos = sc.textFile(hdfs_master+hdfs_path)
(这让我觉得在只有3 * 2可能的情况下看到分配了14个核心是很奇怪的:即节点的cpupu的CTR.executor.instances * nb):
以下是群集摘要:
app-20170227140938-0009的执行人摘要:
ExecutorID Worker Cores Memory State ▾ Logs
1488 worker-20170227125912-xx.xx.xx.xx-38028 2 1024 RUNNING stdout stderr
1489 worker-20170227125954-xx.xx.xx.xx-48962 2 1024 RUNNING stdout stderr
5 worker-20170227125959-xx.xx.xx.xx-48149 2 1024 RUNNING stdout stderr
1486 worker-20170227130012-xx.xx.xx.xx-47639 2 1024 RUNNING stdout stderr
1490 worker-20170227130027-xx.xx.xx.xx-44921 2 1024 RUNNING stdout stderr
1485 worker-20170227130152-xx.xx.xx.xx-50620 2 1024 RUNNING stdout stderr
1487 worker-20170227130248-xx.xx.xx.xx-42100 2 1024 RUNNING stdout stderr
以及一名工人的错误示例:
app-20170227140938-0009 / 1488的stderr日志页面
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/02/27 14:37:57 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 5864@vpsxxx.ovh.net
17/02/27 14:37:57 INFO SignalUtils: Registered signal handler for TERM
17/02/27 14:37:57 INFO SignalUtils: Registered signal handler for HUP
17/02/27 14:37:57 INFO SignalUtils: Registered signal handler for INT
17/02/27 14:37:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/27 14:37:58 INFO SecurityManager: Changing view acls to: spark
17/02/27 14:37:58 INFO SecurityManager: Changing modify acls to: spark
17/02/27 14:37:58 INFO SecurityManager: Changing view acls groups to:
17/02/27 14:37:58 INFO SecurityManager: Changing modify acls groups to:
17/02/27 14:37:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); groups with view permissions: Set(); users with modify permissions: Set(spark); groups with modify permissions: Set()
17/02/27 14:38:01 WARN ThreadLocalRandom: Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:174)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:270)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
Caused by: java.io.IOException: Failed to connect to localhost/127.0.0.1:35529
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:35529
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
我知道两个进程之间存在简单的通信问题。
所以我显示了/ etc / hosts:
127.0.0.1 localhost
193.xx.xx.xxx vpsxxxx.ovh.net vpsxxxx
有什么想法吗?
答案 0 :(得分:0)
检查SPARK_LOCAL_IP是否设置为每个从站中的正确IP。