spark application java.lang.OutOfMemoryError:直接缓冲内存

时间:2016-01-21 11:45:03

标签: java apache-spark out-of-memory

  1. 我正在使用以下运行时间spark配置值
  2.   

    spark-submit --executor-memory 8G   --spark.yarn.executor.memoryOverhead 2G

    但是仍然会出现内存不足错误:

    我有一个具有8362269460行的pairRDD,分区大小为128.当pairRDD.groupByKey.saveAsTextFile时,它会引发此错误。任何线索?

    更新:       我添加了一个过滤器,现在数据行是2300000000.Running in spark shell,没有错误。       我的群集:                  19 datenode 1 namdnode

                 Min Resources: <memory:150000, vCores:150>
                 Max Resources: <memory:300000, vCores:300>
    

    感谢您的帮助。

    org.apache.spark.shuffle.FetchFailedException: java.lang.OutOfMemoryError: Direct buffer memory
      at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:321)
      at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:306)
      at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
      at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
      at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
      at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
      at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:132)
      at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60)
      at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:89)
      at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      at org.apache.spark.scheduler.Task.run(Task.scala:88)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)
    Caused by: io.netty.handler.codec.DecoderException:  Direct buffer memory
      at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:234)
      at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
      at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
      at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
      at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
      at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      ... 1 more
    Caused by: java.lang.OutOfMemoryError: Direct buffer memory
      at java.nio.Bits.reserveMemory(Bits.java:658)
      at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
      at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
      at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:651)
      at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
      at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
      at io.netty.buffer.PoolArena.reallocate(PoolArena.java:358)
      at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:121)
      at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
      at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
      at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
      at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
      at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
      at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:228)
      ... 10 more
    )
    

    我想知道如何正确配置直接内存大小。 最好的问候

1 个答案:

答案 0 :(得分:2)

我不知道有关spark app的任何细节,但我找到了内存配置here 你需要设置-XX:MaxDirectMemorySize与任何其他JVM内存类似。设置(over -XX :) 尝试使用spark.executor.extraJavaOptions

如果您使用spark-submit,可以使用:

./bin/spark-submit --name "My app" ...
  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:MaxDirectMemorySize=512m" myApp.jar