尝试在Python中循环遍历二进制文件

时间:2019-03-04 18:27:29

标签: python

我正在尝试使用Python遍历一个充满8字节记录的长二进制文件。

每条记录的格式为[ uint16 | uint16 | uint32 ]
(在结构格式中为"HHI"

显然每个8字节块都被视为int,而不是8字节数组,然后导致struct.unpack调用失败

with open(fname, "rb") as f:
    sz=struct.calcsize("HHI")
    print(sz)                # This shows 8, as expected 
    for raw in f.read(sz):   # Expect this should read 8 bytes into raw
        print(type(raw))     # This says raw is an 'int', not a byte-array
        record=struct.unpack("HHI", raw ) # "TypeError: a bytes-like object is required, not 'int'"
        print(record)

我如何将我的文件读为一系列结构,并分别打印出来?

3 个答案:

答案 0 :(得分:2)

f.read(len)仅返回字节字符串。这样raw将是一个字节。

正确的循环方式是:

with open(fname, 'rb') as f:
    while True:
        raw = f.read(8)
        if len(raw)!=8:
            break # ignore the incomplete "record" if any
        record = struct.unpack("HHI", raw )
        print(record)

答案 1 :(得分:2)

内置iter(如果传递了可调用对象和哨兵值)将反复调用可调用对象,直到返回哨兵值为止。

因此,您可以使用functools.partial(或使用lambda)创建部分函数,​​并将其传递给iter,如下所示:

with open('foo.bin', 'rb') as f:
    chunker = functools.partial(f.read, 8)
    for chunk in iter(chunker, b''):      # Read 8 byte chunks until empty byte returned
        # Do stuff with chunk

答案 2 :(得分:0)

我以前从未使用过它,但是它看起来像一个初始化问题:

   with open(fname, "rb") as f:
        fmt = 'HHI'
        raw=struct.pack(fmt,1,2,3)
        len=struct.calcsize(fmt)
        print(len)               # This shows 8, as expected 
        for raw in f.read(len):  # Expect this should read 8 bytes into raw
            print(type(raw))     # This says raw is an 'int', not a byte-array
            record=struct.unpack(fmt, raw ) # "TypeError: a bytes-like object is required, not 'int'"
            print(record)

如果有足够的内存,您可能需要查看iter_unpack()进行优化。

请注意,在3.7中,默认值从字节更改为字符串。请参阅第https://docs.python.org/3/library/struct.html#struct.pack页的结尾处