我尝试升级到Apache Spark 1.6.0 RC3。我的应用程序现在几乎在每项任务中都会发现这些错误:
Managed memory leak detected; size = 15735058 bytes, TID = 830
我已将org.apache.spark.memory.TaskMemoryManager
的日志记录级别设置为DEBUG
,并在日志中查看:
I2015-12-18 16:54:41,125 TaskSetManager: Starting task 0.0 in stage 7.0 (TID 6, localhost, partition 0,NODE_LOCAL, 3026 bytes)
I2015-12-18 16:54:41,125 Executor: Running task 0.0 in stage 7.0 (TID 6)
I2015-12-18 16:54:41,130 ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
I2015-12-18 16:54:41,130 ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
D2015-12-18 16:54:41,188 TaskMemoryManager: Task 6 acquire 5.0 MB for null
I2015-12-18 16:54:41,199 ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
I2015-12-18 16:54:41,199 ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
D2015-12-18 16:54:41,262 TaskMemoryManager: Task 6 acquire 5.0 MB for null
D2015-12-18 16:54:41,397 TaskMemoryManager: Task 6 release 5.0 MB from null
E2015-12-18 16:54:41,398 Executor: Managed memory leak detected; size = 5245464 bytes, TID = 6
如何调试这些错误?有没有办法记录分配和解除分配的堆栈跟踪,所以我可以找到泄漏的内容?
我对新的统一内存管理器(SPARK-10000)了解不多。泄漏可能是我的错,还是可能是Spark bug?
答案 0 :(得分:25)
简短的回答是用户不应该看到此消息。用户不应该在统一内存管理器中创建内存泄漏。
发生此类泄漏是Spark错误:SPARK-11293
但是如果你想了解内存泄漏的原因,我就是这样做的。
acquireExecutionMemory
releaseExecutionMemory
和logger.error("stack trace:", new Exception());
中添加额外的记录:TaskMemoryManager.java
现在,您将看到所有分配和解除分配的完整堆栈跟踪。尝试匹配它们并找到没有解除分配的分配。您现在拥有泄漏源的堆栈跟踪。
答案 1 :(得分:1)
我也找到了此警告消息,但这是由“ df.repartition(rePartNum,df(“ id”))“引起的。 我的df为空,并且警告消息的行等于rePartNum。 版: spark2.4 win10