以YARN群集模式运行的火花应用程序,不断出现以下错误4k次以上。
19/01/27 03:04:50,824 ERROR client.TransportClient: Failed to send request StreamChunkId{streamId=1779570868760, chunkIndex=435} to xxxx java.io.IOException: Connection reset by peer
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.spark.network.sasl.SaslEncryption$EncryptedMessage.transferTo(SaslEncryption.java:219)
at io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:254)
at io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:237)
at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:281)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:761)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.forceFlush(AbstractNioChannel.java:317)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:519)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
奇怪的是,应用程序可以完成运行并获得结果。
Spark Ver 1.6,YARN群集模式,30个执行程序,每个执行程序5个内核,15GB内存。
我想找到导致此类错误发生的根本原因。