Python内存泄漏,类中的加载属性未被收集

时间:2018-03-15 00:41:28

标签: python python-2.7 memory-leaks

我正在使用python 2.7.14。可以在OSX和Linux上重现。

我有一些python类:

import cPickle

class TestClass:
    def __init__(self, path_to_data=None):
        self.loaded_data = None
        if path_to_data:
            self.load(path_to_data)

    def load(self, path_to_data):
        self.loaded_data = None
        with open(path_to_data, 'r') as f:
            self.loaded_data = cPickle.load(f)

你可以制作一个体面的腌制词典:

>>> import cPickle
>>> d = {x:x+1 for x in range(1000000)}
>>> with open('testdict.pkl', 'w+') as f:
>>>     cPickle.dump(d, f)

并重复这样的问题:

>>> from test_py import TestClass
>>> import psutil
>>> import os
>>> process = psutil.Process(os.getpid())
>>> process.memory_info()
pmem(rss=8085504L, vms=4405288960L, pfaults=2154, pageins=0)
>>>
>>> t = TestClass('testdict.pkl')
>>> process.memory_info()
pmem(rss=155897856L, vms=4552028160L, pfaults=38241, pageins=0)
>>>
>>> t = TestClass('testdict.pkl')
>>> process.memory_info()
pmem(rss=255520768L, vms=4651646976L, pfaults=62563, pageins=0)
>>>
>>> del t
>>> process.memory_info()
pmem(rss=255520768L, vms=4651646976L, pfaults=62563, pageins=0)

为什么内存没有被垃圾收集?其他东西并不完全相加:sys.getsizeof(t.loaded_data)仅返回50331928,但两个负载之间rss的差异大于此值。这是一个错误,还是我不了解的功能,我该如何避免?

谢谢!

修改

对于那些指出cPickle可能有内存泄漏的人,这里有一个变体:

from marisa_trie import Trie

class TestClass:
    def __init__(self, path_to_data=None):
        self.loaded_data = None
        if path_to_data:
            self.load(path_to_data)

    def load(self, path_to_data):
        self.loaded_data = None
        self.loaded_data = Trie().load(path_to_data)

运行脚本

from test_py import TestClass
import psutil
import os
import gc
process = psutil.Process(os.getpid())
print 'empty process:', process.memory_info()
t = TestClass('testtrie.trie')
print 'first load:', process.memory_info()
t = TestClass('testtrie.trie')
print 'second load:', process.memory_info()
gc.collect()
print 'after gc.collect:', process.memory_info()

打印

empty process: pmem(rss=8052736L, vms=4405383168L, pfaults=2158, pageins=134)
first load: pmem(rss=9801728L, vms=4406640640L, pfaults=2585, pageins=158)
second load: pmem(rss=11382784L, vms=4407898112L, pfaults=2971, pageins=158)
after gc.collect: pmem(rss=11382784L, vms=4407898112L, pfaults=2971, pageins=158)

(此处testtrie.trie的构建如下:

from marisa_trie import Trie
Trie(unicode(x) for x in range(1000000)).save('testtrie.trie')

0 个答案:

没有答案