给出必须跳过的字节范围列表:
skip_ranges = [(1, 3), (5,7)]
和二进制文件:
f = open('test', 'rb')
在没有修改原始文件的情况下,在没有字节1-3和5-7的情况下返回文件内容的最快方法是什么?
输入(文件内容):
012345678
输出:
048
请注意,这个问题是关于(可能很大)的二进制文件,所以生成器是最好的。
答案 0 :(得分:2)
你说文件可能很大,所以我改编了@ juanpa.arrivillaga解决方案来读取块中的文件并将各个块作为生成器生成:
def read_ranges(filename, skip_ranges, chunk_size=1024):
with open(filename, 'rb') as f:
prev = -1
for start, stop in skip_ranges:
end = start - prev - 1
# Go to next skip-part in chunk_size steps
while end > chunk_size:
data = f.read(chunk_size)
if not data:
break
yield data
end -= chunk_size
# Read last bit that didn't fit in chunk
yield f.read(end)
# Seek to next skip
f.seek(stop + 1, 0)
prev = stop
else:
# Read remainder of file in chunks
while True:
data = f.read(chunk_size)
if not data:
break
yield data
print list(read_ranges('test', skip_ranges))
答案 1 :(得分:1)
这种方法应该相对较快:
ba = bytearray()
with open('test.dat','rb') as f:
prev = -1
for start, stop in skip_ranges:
ba.extend(f.read(start - prev - 1))
f.seek(stop + 1,0)
prev = stop
else:
ba.extend(f.read())