Pyspark放弃所有连接

时间:2017-01-19 03:46:49

标签: apache-spark pyspark

我正在尝试在我的火花网格上诊断一些奇怪的连接问题:我看到一个疯狂的连接丢失数量。

我在分布式pyspark集群上运行看起来像这样的东西

spark_context.parallelize(tasks)) \
                .map(lambda kwargs: my_mapped_fn(**kwargs) \
                .reduceByKey(my_reduce_by_key) \
                .map(lambda (x,y): (x, my_final_map(x,y))) \
                .reduce(my_final_reduce)

我很确定它在my_final_map部分失败了,因此我怀疑关闭传输,所以很多我的工作都失败了。

以下是我得到的错误:

java.io.IOException:无法连接到10.12.9.117:38103     在org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)     在org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)     在org.apache.spark.network.netty.NettyBlockTransferService $$ anon $ 1.createAndStart(NettyBlockTransferService.scala:97)     在org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)     在org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)     在org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:106)     在org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:92)     在org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:579)     在org.apache.spark.scheduler.TaskResultGetter $$ anon $ 3 $$ anonfun $ run $ 1.apply $ mcV $ sp(TaskResultGetter.scala:82)     在org.apache.spark.scheduler.TaskResultGetter $$ anon $ 3 $$ anonfun $ run $ 1.apply(TaskResultGetter.scala:63)     在org.apache.spark.scheduler.TaskResultGetter $$ anon $ 3 $$ anonfun $ run $ 1.apply(TaskResultGetter.scala:63)     在org.apache.spark.util.Utils $ .logUncaughtExceptions(Utils.scala:1951)     在org.apache.spark.scheduler.TaskResultGetter $$ anon $ 3.run(TaskResultGetter.scala:62)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)     在java.lang.Thread.run(Thread.java:745) 引起:io.netty.channel.AbstractChannel $ AnnotatedConnectException:连接被拒绝:10.12.9.117:38103     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)     在io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)     at io.netty.channel.nio.AbstractNioChannel $ AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)     在io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:640)     at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)     在io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)     在io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)     at io.netty.util.concurrent.SingleThreadEventExecutor $ 2.run(SingleThreadEventExecutor.java:140)     at io.netty.util.concurrent.DefaultThreadFactory $ DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)     ......还有1个

1 个答案:

答案 0 :(得分:0)

如果有人发现这个有用,实际答案与火花完全无关。事实上,某些节点上的IP地址查找已被破坏。