巨大的文件是这样的。
@delimiter...xxxxxxx 1st line
atgccccccccccccccc... 2nd line
+ 3rd line
agtrc!%%^*()_+!... 4th line
这四行继续。分隔符可以在第1行。我想要做的是如果分隔符位于第1行,我想写下以下4行。
这是我的代码。
with open("hugefile") as fin, open("hugefile_out") as fout:
for line in fin:
if delimiter in line:
1st_line = line
2nd_line = fin.next()
3rd_line = fin.next()
4th_line = fin.next()
fout.write(1st_line + 2nd_line + 3rd_line + 4th_line)
通常需要4到5个小时来完成这项工作。(我删除了一个功能。)有没有办法让它更快?(我使用pypy。)输入文件是1~100Gb所以那些重复的代码似乎没必要。
也许是这样的?
fout.write(line + fin.next() + fin.next() + fin.next())
谢谢你的支持!
答案 0 :(得分:2)
我建议采用以下方法:
因此,代码将是这样的:
sawDelim = False
idx = 1
with open("hugefile") as fin, open("hugefile_out") as fout:
for line in fin:
if delimiter in line:
sawDelim = True
if sawDelim:
fout.write(line)
idx += 1
# now that we've printed out 4 lines, reset and keep looking
# (or could also bail if you want to only find one set)
if (idx > 4):
idx = 1
sawDelim = False