这可能是微不足道的,但我不确定我理解,我试着用Google搜索,但没有找到令人信服的答案。
>>> sys.getsizeof({})
140
>>> sys.getsizeof({'Hello':'World'})
140
>>>
>>> yet_another_dict = {}
>>> for i in xrange(5000):
yet_another_dict[i] = i**2
>>>
>>> sys.getsizeof(yet_another_dict)
98444
我如何理解这一点? 为什么空字典与非空字典的大小相同?
答案 0 :(得分:9)
有两个原因:
Dictionary只保存对象的引用,而不是对象本身,因此它的大小与它包含的对象的大小无关,而是与字典包含的引用(项)的数量相关。
更重要的是,字典为块中的引用预分配内存。因此,当您创建字典时,它已经为第一个n
引用预先分配了内存。当它填满内存时,它会预先分配一个新的块。
你可以观察到这种行为,运行下一代代码。
d = {}
size = sys.getsizeof(d)
print size
i = 0
j = 0
while i < 3:
d[j] = j
j += 1
new_size = sys.getsizeof(d)
if size != new_size:
print new_size
size = new_size
i += 1
打印出来:
280
1048
3352
12568
在我的机器上,但这取决于架构(32位,64位)。
答案 1 :(得分:7)
CPython中的字典直接在字典对象本身中分配少量密钥空间(4-8个条目,具体取决于版本和编译选项)。来自dictobject.h
:
/* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are
* allocated directly in the dict object (in the ma_smalltable member).
* It must be a power of 2, and at least 4. 8 allows dicts with no more
* than 5 active entries to live in ma_smalltable (and so avoid an
* additional malloc); instrumentation suggested this suffices for the
* majority of dicts (consisting mostly of usually-small instance dicts and
* usually-small dicts created to pass keyword arguments).
*/
#ifndef Py_LIMITED_API
#define PyDict_MINSIZE 8
请注意,CPython还会批量调整字典大小,以避免频繁重新分配字典。来自dictobject.c
:
/* If we added a key, we can safely resize. Otherwise just return!
* If fill >= 2/3 size, adjust size. Normally, this doubles or
* quaduples the size, but it's also possible for the dict to shrink
* (if ma_fill is much larger than ma_used, meaning a lot of dict
* keys have been * deleted).
*
* Quadrupling the size improves average dictionary sparseness
* (reducing collisions) at the cost of some memory and iteration
* speed (which loops over every possible entry). It also halves
* the number of expensive resize operations in a growing dictionary.
*
* Very large dictionaries (over 50K items) use doubling instead.
* This may help applications with severe memory constraints.
*/
if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2))
return 0;
return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);