为什么空字典的大小与Python中非空字典的大小相同?

时间:2013-09-01 13:24:45

标签: python memory python-2.7 dictionary

这可能是微不足道的,但我不确定我理解,我试着用Google搜索,但没有找到令人信服的答案。

>>> sys.getsizeof({})
140
>>> sys.getsizeof({'Hello':'World'})
140
>>>
>>> yet_another_dict = {}
>>> for i in xrange(5000):
        yet_another_dict[i] = i**2

>>> 
>>> sys.getsizeof(yet_another_dict)
98444

我如何理解这一点? 为什么空字典与非空字典的大小相同?

2 个答案:

答案 0 :(得分:9)

有两个原因:

  1. Dictionary只保存对象的引用,而不是对象本身,因此它的大小与它包含的对象的大小无关,而是与字典包含的引用(项)的数量相关。

  2. 更重要的是,字典为块中的引用预分配内存。因此,当您创建字典时,它已经为第一个n引用预先分配了内存。当它填满内存时,它会预先分配一个新的块。

  3. 你可以观察到这种行为,运行下一代代码。

    d = {}
    size = sys.getsizeof(d)
    print size
    i = 0
    j = 0
    while i < 3:
        d[j] = j
        j += 1
        new_size = sys.getsizeof(d)
        if size != new_size:
            print new_size
            size = new_size
            i += 1
    

    打印出来:

    280
    1048
    3352
    12568
    

    在我的机器上,但这取决于架构(32位,64位)。

答案 1 :(得分:7)

CPython中的字典直接在字典对象本身中分配少量密钥空间(4-8个条目,具体取决于版本和编译选项)。来自dictobject.h

/* PyDict_MINSIZE is the minimum size of a dictionary.  This many slots are
 * allocated directly in the dict object (in the ma_smalltable member).
 * It must be a power of 2, and at least 4.  8 allows dicts with no more
 * than 5 active entries to live in ma_smalltable (and so avoid an
 * additional malloc); instrumentation suggested this suffices for the
 * majority of dicts (consisting mostly of usually-small instance dicts and
 * usually-small dicts created to pass keyword arguments).
 */
#ifndef Py_LIMITED_API
#define PyDict_MINSIZE 8

请注意,CPython还会批量调整字典大小,以避免频繁重新分配字典。来自dictobject.c

/* If we added a key, we can safely resize.  Otherwise just return!
 * If fill >= 2/3 size, adjust size.  Normally, this doubles or
 * quaduples the size, but it's also possible for the dict to shrink
 * (if ma_fill is much larger than ma_used, meaning a lot of dict
 * keys have been * deleted).
 *
 * Quadrupling the size improves average dictionary sparseness
 * (reducing collisions) at the cost of some memory and iteration
 * speed (which loops over every possible entry).  It also halves
 * the number of expensive resize operations in a growing dictionary.
 *
 * Very large dictionaries (over 50K items) use doubling instead.
 * This may help applications with severe memory constraints.
 */
if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
    return 0;
return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);