我有一个包含大量数据的文本文件。我希望能够读取文本文件并编写新的文本文件。但是在新的文本文件中,我不希望它包含原始文件的某些部分。
例如文本文件有
------------------------
Age: 39
Gender: Female
Smoking: Yes
remarks: something about the person
-----------------------
Age: 52
Gender: Male
Smoking: Yes
remarks: something about the person
-----------------------
如何让新文件只读取年龄和性别,以便新文本文件看起来像(也包括划分每个条目的破折号):
-----------------------
Age: 39
Gender: Female
-----------------------
Age: 52
Gender: Male
-----------------------
我已经看过几个代码和其他问题,但它们都不只是删除特定的行。
答案 0 :(得分:5)
with open('path/to/infile') as infile, open('path/to/outfile', 'w') as outfile:
for line in infile:
if line.startswith(("Age", "Gender", "----")):
outfile.write(line)
或者使用grep
:
grep -ioP '^-.*$|^Age:.*$|^Gender:.*$' path/to/infile.txt > path/to/outfile.txt
答案 1 :(得分:0)
import re
file = open('filename.txt','rb').read()
a = re.findall(r'Age: (\d+)\nGender: (Male|Female)', file)
print "-----------------------"
for n in a:
print 'Age: '+n[0]+'\nGender: '+n[1]
print "-----------------------"
你可以更加懒惰,也可以在正则表达式中抓住Dashes
a = re.findall(r'Age: (\d+)\nGender: (Male|Female)(?:.*\n){3}(\-*)', file)
for n in a:
print "Age: "+n[0]+ "\nGender: "+n[1]+"\n" + n[2]