this question的答案之一说,以下是读取大型二进制文件而不首先将整个内容读入内存的好方法:
with open(image_filename, 'rb') as content:
for line in content:
#do anything you want
我认为指定'rb'
的重点是忽略行结尾,因此for line in content
如何工作?
这是读取大型二进制文件的最“Pythonic”方法还是有更好的方法?
答案 0 :(得分:4)
我会写一个简单的帮助函数来读取你想要的块:
def read_in_chunks(infile, chunk_size=1024):
while True:
chunk = infile.read(chunk_size)
if chunk:
yield chunk
else:
# The chunk was empty, which means we're at the end
# of the file
return
像你for line in file
一样使用:
with open(fn. 'rb') as f:
for chunk in read_in_chunks(f):
# do you stuff on that chunk...
BTW:5年前我问了THIS问题,这是当时答案的变体......
你也可以这样做:
from collections import partial
with open(fn,'rb') as f:
for chunk in iter(functools.partial(f.read, numBytes),''):
答案 1 :(得分:3)
for line in fh
都会以新行分割
通常使用二进制文件,您可以在块中使用它们
CHUNK_SIZE=1024
for chunk in iter(lambda:fh.read(CHUNK_SIZE),""):
do_something(chunk)
答案 2 :(得分:2)
二进制模式意味着不会转换行结尾并读取bytes
个对象(在Python 3中);使用for line in f
时,“line”仍将读取该文件。不过,我会使用read
来读取一致的块。
with open(image_filename, 'rb') as f:
# iter(callable, sentinel) – yield f.read(4096) until b'' appears
for chunk in iter(lambda: f.read(4096), b''):
…