Pyspark:OOM无法创建新线程

时间:2015-09-13 07:18:24

标签: multithreading apache-spark pyspark

我从pyspark shell运行Spark作业并获取

Exception in thread "stdout writer for python2.7" java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:714)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:92)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
...
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
    at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)

作业是从Parquet文件中选择一个数据帧,转换为RDD,执行转换并将结果存储到其他Parquet文件。

在1节点独立模式下运行以及在小型2节点YARN群集上运行时,会出现同样的问题。 当我增加默认的pyspark shell参数时,同样的问题仍然存在:

pyspark --driver-memory 8G --executor-memory 8G

在抛出异常(并且处理停止)之前,控制台中有几个警告:

15/09/13 00:12:17 WARN TaskSetManager: Stage 186 contains a task of very large size (4722 KB). The maximum recommended task size is 100 KB.
15/09/13 00:12:19 ERROR ShuffleBlockFetcherIterator: Error occurred while fetching local blocks
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
    at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
    at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:239)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:269)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:115)
    at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:76)
    at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:40)
    at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
    at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)

15/09/13 00:12:21 WARN TaskSetManager: Stage 369 contains a task of very large size (4722 KB). The maximum recommended task size is 100 KB.
15/09/13 00:12:21 ERROR ShuffleBlockFetcherIterator: Error occurred while fetching local blocks
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:239)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:269)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:115)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:76)
at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:40)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)

类似问题的一些答案提到这种情况发生是因为Spark的内存未正确调整。 如何调整Spark内存?哪些设置会影响此行为?

第一个堆栈跟踪很大,并且由递归调用的相同几个方法组成。如果我在崩溃时也监视Spark JVM,那么当前正在运行的线程会出现大量的突破,并且会突破屋顶&#34;。 所有这些都看起来像某种基于内部Java ThreadPoolExector的糟糕内部实现,如果任务开始处理的速度慢于新的,则不会限制线程数。

哪些配置旋钮可以帮助我防止突然的线程创建高峰?

0 个答案:

没有答案