apache-spark org.apache.spark.rpc.RpcTimeoutException:120中无法接收任何回复

时间:2016-02-19 08:06:17

标签: apache-spark

我在独立模式下配置了一个spark群集。我可以看到两个工作都在运行,但是当我启动一个spark-shell时,我遇到了这个问题: 火花簇的配置是自动的。

val lines=sc.parallelize(List(1,2,3,4))

这项工作确定新的rdd已创建,但是当我开始下一个任务时

lines.take(2).foreach(println) 

我有这个错误,我无法解决:

输出:

 16/02/18 10:27:02 INFO DAGScheduler: Got job 0 (take at :24) with 1 output partitions
 16/02/18 10:27:02 INFO DAGScheduler: Final stage: ResultStage 0 (take at :24)
 16/02/18 10:27:02 INFO DAGScheduler: Parents of final stage: List() 
 16/02/18 10:27:02 INFO DAGScheduler: Missing parents: List() 
 16/02/18 10:27:02 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at :21), which has no missing parents
 16/02/18 10:27:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1288.0 B, free 1288.0 B)
 16/02/18 10:27:04 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 856.0 B, free 2.1 KB)

一分半钟后:

16/02/18 10:28:43 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(0,java.io.IOException: Failed to create directory /srv/spark/work/app-20160218102438-0000/0)] in 2 attempts org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
    at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
    at scala.util.Try$.apply(Try.scala:161)
    at scala.util.Failure.recover(Try.scala:185)
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) 
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) 
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
    at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293) 
    at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133) 
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
    at scala.concurrent.Promise$class.complete(Promise.scala:55) 
    at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658)
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) 
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634) 
    at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) 
    at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685) 
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
    at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153) 
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:241) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds 
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:242) ... 7 more

在一名工人中,我可以看到此日志错误:

Invalid maximum head size: -Xmxm0M could not create JVM

我也可以看到一些问题,我认为这与绑定端口或类似问题有关。

2 个答案:

答案 0 :(得分:3)

问题可能出在spark群集和客户端实例之间的端口可见性区域。

从用户端来看很奇怪,但这是Spark架构的特性 - 每个Spark节点都应该在SparkContext配置中看到spark.driver.port定义的客户端实例和特定端口。默认情况下,此选项为空,这意味着将随机选择此端口。因此,默认配置中,每个Spark节点都需要查看客户端实例的任何端口。但您可以覆盖spark.driver.port

如果您的客户端计算机位于防火墙后面或者在docker容器内,则可能会出现问题。你需要在

之外打开这个端口

答案 1 :(得分:1)

如果您正在使用VM,则需要至少拥有2个CPU处理器。您必须在VM配置中进行设置。