我有一个巨大的文本文件(models.txt)并包含如下所示的行:
Model 1
text
text
text
text
END
Model 2
text
text
text
text
END
Model 3
text
text
text
text
END
我想编写一个函数,以便它可以将“模型1”,“模型2”和“模型3”作为起点,将“结束”作为结束点并写出放置文件model_1.txt,model_2各个块的.txt和Model_3.txt
因为我不太了解编程所以我写这个
a = open('C:/Users/Zebrafish/Desktop/AHR_human_modeling/human/edited/1AHH.B99990013.pdb','r')
lines = a.readlines()
x = 1
for line in lines:
if 'END' in line:
PDB_file = open('C:/Users/Zebrafish/Desktop/AHR_human_modeling/human/edited/model_1.pdb','w')
PDB_file.write(line)
PDB_file.close()
答案 0 :(得分:4)
from itertools import groupby
with open('infile') as f:
groups = groupby(f, key=str.isspace)
for k, lines in groups:
if k:
continue
fname = next(lines).strip().lower().replace(' ', '_')+'.txt'
with open(fname, 'w') as outf:
outf.writelines(lines)
答案 1 :(得分:0)
如果您的文件适合内存,那么您可以使用正则表达式拆分文件,然后迭代匹配:
with open('models.txt') as handle:
models = re.findall("Model.*?END", handle.read(), re.MULTILINE|re.DOTALL)
for i, model in enumerate(models):
with open('model_%s.txt' % i) as output:
output.write(model)