Spark 2.3执行程序内存泄漏

时间:2018-05-25 09:40:32

标签: python python-3.x apache-spark memory-leaks pyspark

我收到了内存泄漏警告,理想情况下,这是一个Spark bug,直到1.6版本并且已经解决。

模式:独立 IDE:PyCharm Spark版本:2.3 Python版本:3.6

下面是堆栈跟踪 -

2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3148
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3152
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3151
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3150
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3149
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3153
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3154
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3158
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3155
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3157
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3160
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3161
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3156
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3159
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3165
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3163
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3162
2018-05-25 15:00:05 WARN  Executor:66 - Managed memory leak detected; size = 262144 bytes, TID = 3166

有关可能发生的原因的任何见解?虽然我的工作成功完成了。

编辑:许多人说这是2年前问题的重复,但是那里的答案说这是一个Spark bug,但是当在Spark的Jira中检查时,它说它已经解决了。

这里的问题是,后来有这么多版本,为什么我在Spark 2.3中仍然会这样做?如果对我的查询有一些有效或合理的答案,我肯定会删除这个问题。

1 个答案:

答案 0 :(得分:1)

根据SPARK-14168,该警告源于不消耗整个迭代器。从Spark Shell的RDD中获取n个元素时,我遇到了相同的错误。