循环引用的对象没有收集垃圾

时间:2011-12-31 11:42:41

标签: python garbage-collection

我有一个小的方便的类,我在我的代码中使用了很多,如下所示:

class Structure(dict):
    def __init__(self, **kwargs):
        dict.__init__(self, **kwargs)
        self.__dict__ = self

关于它的好处是你可以使用字典键语法或通常的对象样式来访问属性:

myStructure = Structure(name="My Structure")
print myStructure["name"]
print myStructure.name

今天我注意到我的应用程序内存消耗在我预期会减少的情况下略有增加。在我看来,从结构类生成的实例不会被收集。为了说明这一点,这是一个小片段:

import gc

class Structure(dict):
    def __init__(self, **kwargs):
        dict.__init__(self, **kwargs)
        self.__dict__ = self

structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)]
print "Structure name: ", structures[16].name
print "Structure name: ", structures[16]["name"]
del structures
gc.collect()
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure])

使用以下输出:

Structure name:  __16
Structure name:  __16
Structures count:  4096

正如您注意到结构实例计数仍为4096。

我评论了创建方便的自我引用的行:

import gc

class Structure(dict):
    def __init__(self, **kwargs):
        dict.__init__(self, **kwargs)
        # self.__dict__ = self

structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)]
# print "Structure name: ", structures[16].name
print "Structure name: ", structures[16]["name"]
del structures
gc.collect()
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure])

现在删除了循环引用,输出才有意义:

Structure name:  __16
Structures count:  0

我使用Melia进一步推动测试以分析内存消耗:

import gc
import pprint
from meliae import scanner
from meliae import loader

class Structure(dict):
    def __init__(self, **kwargs):
        dict.__init__(self, **kwargs)
        self.__dict__ = self

structures = [Structure(name="__{0}".format(str(value))) for value in range(4096)]
print "Structure name: ", structures[16].name
print "Structure name: ", structures[16]["name"]
del structures
gc.collect()
print "Structures count: ", len([obj for obj in gc.get_objects() if type(obj) is Structure])

scanner.dump_all_objects("Test_001.json")
om = loader.load("Test_001.json")
summary = om.summarize()
print summary

structures = om.get_all("Structure")
if structures:
    pprint.pprint(structures[0].c)

生成以下输出:

Structure name:  __16
Structure name:  __16
Structures count:  4096
loading... line 5001, 5002 objs,   0.6 /   1.8 MiB read in 0.2s
loading... line 10002, 10003 objs,   1.1 /   1.8 MiB read in 0.3s
loading... line 15003, 15004 objs,   1.7 /   1.8 MiB read in 0.5s
loaded line 16405, 16406 objs,   1.8 /   1.8 MiB read in 0.5s        
checked        1 /    16406 collapsed        0    
checked    16405 /    16406 collapsed      157    
compute parents        0 /    16249        
compute parents    16248 /    16249        
set parents    16248 /    16249        
collapsed in 0.2s
Total 16249 objects, 58 types, Total size = 3.2MiB (3306183 bytes)
 Index   Count   %      Size   % Cum     Max Kind
     0    4096  25   1212416  36  36     296 Structure
     1     390   2    536976  16  52   49432 dict
     2    5135  31    417550  12  65   12479 str
     3      82   0    290976   8  74   12624 module
     4     235   1    212440   6  80     904 type
     5     947   5    121216   3  84     128 code
     6    1008   6    120960   3  88     120 function
     7    1048   6     83840   2  90      80 wrapper_descriptor
     8     654   4     47088   1  92      72 builtin_function_or_method
     9     562   3     40464   1  93      72 method_descriptor
    10     517   3     37008   1  94     216 tuple
    11     139   0     35832   1  95    2280 set
    12     351   2     30888   0  96      88 weakref
    13     186   1     23200   0  97    1664 list
    14      63   0     21672   0  97     344 WeakSet
    15      21   0     18984   0  98     904 ABCMeta
    16     197   1     14184   0  98      72 member_descriptor
    17     188   1     13536   0  99      72 getset_descriptor
    18     284   1      6816   0  99      24 int
    19      14   0      5296   0  99    2280 frozenset
[Structure(4312707312 296B 2refs 2par),
 type(4298634592 904B 4refs 100par 'Structure')]

内存使用量为3.2MiB,删除自引用行会导致以下输出:

Structure name:  __16
Structures count:  0
loading... line 5001, 5002 objs,   0.6 /   1.4 MiB read in 0.1s
loading... line 10002, 10003 objs,   1.1 /   1.4 MiB read in 0.3s
loaded line 12308, 12309 objs,   1.4 /   1.4 MiB read in 0.4s        
checked       12 /    12309 collapsed        0    
checked    12308 /    12309 collapsed      157    
compute parents        0 /    12152        
compute parents    12151 /    12152        
set parents    12151 /    12152        
collapsed in 0.1s
Total 12152 objects, 57 types, Total size = 2.0MiB (2093714 bytes)
 Index   Count   %      Size   % Cum     Max Kind
     0     390   3    536976  25  25   49432 dict
     1    5134  42    417497  19  45   12479 str
     2      82   0    290976  13  59   12624 module
     3     235   1    212440  10  69     904 type
     4     947   7    121216   5  75     128 code
     5    1008   8    120960   5  81     120 function
     6    1048   8     83840   4  85      80 wrapper_descriptor
     7     654   5     47088   2  87      72 builtin_function_or_method
     8     562   4     40464   1  89      72 method_descriptor
     9     517   4     37008   1  91     216 tuple
    10     139   1     35832   1  92    2280 set
    11     351   2     30888   1  94      88 weakref
    12     186   1     23200   1  95    1664 list
    13      63   0     21672   1  96     344 WeakSet
    14      21   0     18984   0  97     904 ABCMeta
    15     197   1     14184   0  98      72 member_descriptor
    16     188   1     13536   0  98      72 getset_descriptor
    17     284   2      6816   0  99      24 int
    18      14   0      5296   0  99    2280 frozenset
    19      22   0      2288   0  99     104 classobj

确认结构实例已被破坏且内存使用量降至2.0MiB。

知道我怎么能确保这个类得到正确的垃圾收集?所有这些都是顺便在Python 2.7.2(达尔文)上执行的。

干杯,

托马斯

1 个答案:

答案 0 :(得分:3)

您可以使用__getattr____setattr__更直接地实现您的Structure类,以允许属性访问转到底层字典。

class Structure(dict):
    def __getattr__(self, k):
        return self[k]
    def __setattr__(self, k, v):
        self[k] = v

Cycles 垃圾收集在Python中,但只是定期(不像常规引用计数的对象,一旦引用计数降到0就会被收集)。

避免循环(使用__getattr____setattr__作为Structure类),意味着您将获得更好的gc行为。您可能希望将collections.namedtuple看作是一个不错的选择:它并没有完全按照您的实施方式进行,但也许它适合您的目的。