我有一个PySpark作业,可以在小型集群中成功运行,但在启动时的前几分钟内会出现很多以下错误。关于如何解决它的任何想法?这是使用PySpark 2.2.0和mesos。
17/09/29 18:54:26 INFO Executor: Running task 5717.0 in stage 0.0 (TID 5717)
17/09/29 18:54:26 INFO CoarseGrainedExecutorBackend: Got assigned task 5813
17/09/29 18:54:26 INFO Executor: Running task 5813.0 in stage 0.0 (TID 5813)
17/09/29 18:54:26 INFO CoarseGrainedExecutorBackend: Got assigned task 5909
17/09/29 18:54:26 INFO Executor: Running task 5909.0 in stage 0.0 (TID 5909)
17/09/29 18:54:56 ERROR TransportClientFactory: Exception while bootstrapping client after 30001 ms
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timeout waiting for task.
at org.spark_project.guava.base.Throwables.propagate(Throwables.java:160)
at org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:275)
at org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:70)
at org.apache.spark.network.crypto.AuthClientBootstrap.doSaslAuth(AuthClientBootstrap.java:117)
at org.apache.spark.network.crypto.AuthClientBootstrap.doBootstrap(AuthClientBootstrap.java:76)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:244)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:366)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:332)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:654)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:467)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$3.apply(Executor.scala:684)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$3.apply(Executor.scala:681)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:681)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:308)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Timeout waiting for task.
at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:276)
at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
at org.apache.spark.network.client.TransportClient.sendRpcSync(TransportClient.java:271)
... 23 more
17/09/29 18:54:56 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /10.0.1.25:37314 is closed
17/09/29 18:54:56 INFO Executor: Fetching spark://10.0.1.25:37314/files/djinn.spark.zip with timestamp 1506711239350