我正在尝试使用Python遍历一个充满8字节记录的长二进制文件。
每条记录的格式为[ uint16 | uint16 | uint32 ]
(在结构格式中为"HHI"
)
显然每个8字节块都被视为int
,而不是8字节数组,然后导致struct.unpack
调用失败
with open(fname, "rb") as f:
sz=struct.calcsize("HHI")
print(sz) # This shows 8, as expected
for raw in f.read(sz): # Expect this should read 8 bytes into raw
print(type(raw)) # This says raw is an 'int', not a byte-array
record=struct.unpack("HHI", raw ) # "TypeError: a bytes-like object is required, not 'int'"
print(record)
我如何将我的文件读为一系列结构,并分别打印出来?
答案 0 :(得分:2)
f.read(len)
仅返回字节字符串。这样raw
将是一个字节。
正确的循环方式是:
with open(fname, 'rb') as f:
while True:
raw = f.read(8)
if len(raw)!=8:
break # ignore the incomplete "record" if any
record = struct.unpack("HHI", raw )
print(record)
答案 1 :(得分:2)
内置iter(如果传递了可调用对象和哨兵值)将反复调用可调用对象,直到返回哨兵值为止。
因此,您可以使用functools.partial(或使用lambda
)创建部分函数,并将其传递给iter
,如下所示:
with open('foo.bin', 'rb') as f:
chunker = functools.partial(f.read, 8)
for chunk in iter(chunker, b''): # Read 8 byte chunks until empty byte returned
# Do stuff with chunk
答案 2 :(得分:0)
我以前从未使用过它,但是它看起来像一个初始化问题:
with open(fname, "rb") as f:
fmt = 'HHI'
raw=struct.pack(fmt,1,2,3)
len=struct.calcsize(fmt)
print(len) # This shows 8, as expected
for raw in f.read(len): # Expect this should read 8 bytes into raw
print(type(raw)) # This says raw is an 'int', not a byte-array
record=struct.unpack(fmt, raw ) # "TypeError: a bytes-like object is required, not 'int'"
print(record)
如果有足够的内存,您可能需要查看iter_unpack()进行优化。
请注意,在3.7中,默认值从字节更改为字符串。请参阅第https://docs.python.org/3/library/struct.html#struct.pack页的结尾处