连接是否通过对等方重置,并且apache spark的设备错误没有剩余空间?

时间:2016-09-06 08:44:15

标签: apache-spark spark-streaming

我经常在堆栈跟踪中获得以下内容

WARN TransportChannelHandler: Exception in connection from /172.31.3.245:46014
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:898)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)

最终我在设备错误上没有剩余空间,但在研究之后我发现我可以.set("spark.local.dir", "/home/ubuntu/sparktempdata");这减少了我的跟踪“设备上没有剩余空间”错误的频率但是还剩下一个如下面的那个,我不知道如何解决它?

16/09/06 08:34:18 ERROR FileAppender: Error writing stream to file /usr/local/spark/work/app-20160906083355-0000/1/stderr
java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:326)
    at org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92)
    at org.apache.spark.util.logging.FileAppender$$anonfun$appendStreamToFile$1.apply$mcV$sp(FileAppender.scala:75)
    at org.apache.spark.util.logging.FileAppender$$anonfun$appendStreamToFile$1.apply(FileAppender.scala:62)
    at org.apache.spark.util.logging.FileAppender$$anonfun$appendStreamToFile$1.apply(FileAppender.scala:62)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1287)
    at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:78)
    at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
    at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
    at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1857)
    at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

当我打开文件/ usr / local / spark / work / app-20160906083355-0000 / 1 / stderr时,我看到以下内容

INFO Utils: Fetching spark://172.31.11.187:58519/jars/analytics-1.0-SNAPSHOT.jar to /tmp/spark-69b1866b-f302-4ab8-a25f-f2a8cc1f4b4f/executor-99c9eeb0-d45c-4619-8054-7f6d3f15803c/spark-c28a16b5-5ac5-440b-9e4d-7ed1b1b8bcbe/fetchFileTemp6564441043886275791.tmp
16/09/06 08:34:18 WARN TransportChannelHandler: Exception in connection from /172.31.11.187:58519
java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:51)
        at sun.nio.ch.SinkChannelImpl.write(SinkChannelImpl.java:167)
        at org.apache.spark.rpc.netty.NettyRpcEnv$FileDownloadCallback.onData(NettyRpcEnv.scala:395)
        at org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:69)
        at org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:202)
        at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:70)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.pro
                                                                                                                   425,2-9       Bot

这是我的工作节点上的df -h。此外,我的所有机器都有相同数量的资源

Filesystem      Size  Used Avail Use% Mounted on
udev            7.4G   12K  7.4G   1% /dev
tmpfs           1.5G  344K  1.5G   1% /run
/dev/xvda1      7.8G  7.3G   92M  99% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none            7.4G     0  7.4G   0% /run/shm
none            100M     0  100M   0% /run/user
/dev/xvdb        37G   49M   35G   1% /mnt

1 个答案:

答案 0 :(得分:-1)

我认为这个错误是由于管道损坏造成的。基本上是客户端(假设您的笔记本电脑很长时间没有从服务器听到任何声音,因此它假设它已经不再连接了。使用SIGPIPE命令并将其设置为2分钟。以下链接将帮助您。

broken pipe link