Question

我正在尝试使用Python遍历一个充满8字节记录的长二进制文件。

每条记录的格式为[ uint16 | uint16 | uint32 ]
（在结构格式中为"HHI"）

显然每个8字节块都被视为int，而不是8字节数组，然后导致struct.unpack调用失败

with open(fname, "rb") as f:
    sz=struct.calcsize("HHI")
    print(sz)                # This shows 8, as expected 
    for raw in f.read(sz):   # Expect this should read 8 bytes into raw
        print(type(raw))     # This says raw is an 'int', not a byte-array
        record=struct.unpack("HHI", raw ) # "TypeError: a bytes-like object is required, not 'int'"
        print(record)

我如何将我的文件读为一系列结构，并分别打印出来？

Answer 1

f.read(len)仅返回字节字符串。这样raw将是一个字节。

正确的循环方式是：

with open(fname, 'rb') as f:
    while True:
        raw = f.read(8)
        if len(raw)!=8:
            break # ignore the incomplete "record" if any
        record = struct.unpack("HHI", raw )
        print(record)

Answer 2

内置iter（如果传递了可调用对象和哨兵值）将反复调用可调用对象，直到返回哨兵值为止。

因此，您可以使用functools.partial（或使用lambda）创建部分函数，并将其传递给iter，如下所示：

with open('foo.bin', 'rb') as f:
    chunker = functools.partial(f.read, 8)
    for chunk in iter(chunker, b''):      # Read 8 byte chunks until empty byte returned
        # Do stuff with chunk

Answer 3

我以前从未使用过它，但是它看起来像一个初始化问题：

   with open(fname, "rb") as f:
        fmt = 'HHI'
        raw=struct.pack(fmt,1,2,3)
        len=struct.calcsize(fmt)
        print(len)               # This shows 8, as expected 
        for raw in f.read(len):  # Expect this should read 8 bytes into raw
            print(type(raw))     # This says raw is an 'int', not a byte-array
            record=struct.unpack(fmt, raw ) # "TypeError: a bytes-like object is required, not 'int'"
            print(record)

如果有足够的内存，您可能需要查看iter_unpack（）进行优化。

请注意，在3.7中，默认值从字节更改为字符串。请参阅第https://docs.python.org/3/library/struct.html#struct.pack页的结尾处

尝试在Python中循环遍历二进制文件

3 个答案: