Question

我有一个大型二进制文件，我想读入并使用struct.unpack（）解压缩该文件由多个行组成，每行长2957个字节。我使用以下代码读入文件：

with open("bin_file", "rb") as f:
    line = f.read(2957)

我的问题是，为什么，尺寸返回：

import sys
sys.getsizeof(line)

不等于2957（在我的情况下是2978）？

Answer 1

您误解了sys.getsizeof() 所做的事情。它返回Python用于字符串对象的内存量，而不是行的长度。

Python字符串对象跟踪引用计数，对象类型和其他元数据以及实际字符，因此2978字节不与字符串长度相同。

请参阅stringobject.h definition of the type：

typedef struct {
    PyObject_VAR_HEAD
    long ob_shash;
    int ob_sstate;
    char ob_sval[1];

    /* Invariants:
     *     ob_sval contains space for 'ob_size+1' elements.
     *     ob_sval[ob_size] == 0.
     *     ob_shash is the hash of the string or -1 if not computed yet.
     *     ob_sstate != 0 iff the string object is in stringobject.c's
     *       'interned' dictionary; in this case the two references
     *       from 'interned' to this object are *not counted* in ob_refcnt.
     */
} PyStringObject;

其中PyObject_VAR_HEAD在object.h中定义，其中标准ob_refcnt，ob_type和ob_size字段都已定义。

所以一个长度为2957的字符串需要2958个字节（字符串长度+ null），你看到的其余20个字节用于保存引用计数，类型指针，对象＆＃39; size＆＃39; （这里是字符串长度），缓存的字符串哈希和实习状态标志。

其他对象类型将具有不同的内存占用，并且所使用的C类型的确切大小也因平台而异。

Answer 2

由于诸如类型指针和引用计数之类的开销，表示2957字节数据的字符串对象需要超过2957字节的内存来表示。 sys.getsizeof包含此额外费用。

为什么sys.getsizeof（）不在Python中的file.read（[size]）中返回[size]

2 个答案: