这是我正在使用的python代码。我有一个5GB的文件,我需要根据行号分成大约10-12个文件。但是这段代码会给出内存错误。有人可以告诉我这段代码有什么问题吗?
from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
n = 386972
with open('reviewsNew.txt','rb') as f:
for i, g in enumerate(grouper(n, f, fillvalue=''), 1):
with open('small_file_{0}'.format(i * n), 'w') as fout:
fout.writelines(g)
答案 0 :(得分:0)
只需使用groupby,因此您不需要创建386972个迭代器:
from itertools import groupby
n = 386972
with open('reviewsNew.txt','rb') as f:
for idx, lines in groupby(enumerate(iterable), lambda (idx, _): idx // n):
with open('small_file_{0}'.format(idx * n), 'wb') as fout:
fout.writelines(l for _, l in lines)