我要在文本文件中删除特定的字符串和空行,这是我先前的问题……我参考了我们的SO专家的一些示例和解决方案……通过删除字符串,它可以很好地工作,但是不是空行。为了易于理解,我在这里重点说明问题。
文本文件的某些部分包含stringA,stringB和stringC行,并且在其下面还留有空行,只能删除其下的一行。
line0
line1 stringAxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line2 stringBxxxxxxxxxxxxxxxxxxxxxxx
line3 stringCxxxxxxxxxxxxxxxxxxx
line4
line5
line6 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line7 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line8
line9 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line10 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line11 stringBxxxxxxxxxxxxxxxxxxxxxxx
line12 stringCxxxxxxxxxxxxxxxxxxx
line13
line14
line15 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line16 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line17
line18 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line19 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line20
line21 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line22 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line23
line24 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line25 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line26 stringBxxxxxxxxxxxxxxxxxxxxxxx
line27 stringCxxxxxxxxxxxxxxxxxxx
line28
line29
line30 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line31 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line32
在这种情况下,删除任何具有stringA,stringB,stringC及其后一行的行。例如,在上面删除第1,2,3,4行,在第11,12,13行中删除第26,27,28行
我尝试使用strip(),但是它删除了所有空行。这是我使用的脚本,它确实删除了包含stringA,stringB和stringC的所有行。
filename = 'raw.txt'
with open(filename, 'r') as fin:
lines = fin.readlines()
with open('clean.txt', 'w') as fout:
for line in lines:
if not re.match(r"\s+(stringA|stringB|stringC)", line):
fout.write(line)
预期产量
line0
line5
line6 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line7 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line8
line9 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line10 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line14
line15 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line16 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line17
line18 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line19 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line20
line21 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line22 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line23
line24 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line25 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line29
line30 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line31 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line32
感谢您的帮助和友善的帮助。谢谢。
答案 0 :(得分:1)
我很确定这不是最好的答案,但是“类似标志的”方法可以工作:
import re
filename = 'raw.txt'
with open(filename, 'r') as fin:
lines = fin.readlines()
flag = 0
with open('clean.txt', 'w') as fout:
for line in lines:
if not re.match(r'.*(stringA|stringB|stringC)', line):
if not flag:
fout.write(line)
flag = 0
else:
flag = 1
希望有帮助
答案 1 :(得分:1)
优化的解决方案:
with open('raw.txt', 'r') as fin, open('clean.txt', 'w') as fout:
string_c_pat = re.compile(r'\s+stringC')
pat = re.compile(r"\s+(stringA|stringB|stringC)")
for line in fin: # traversing file as iterator
if string_c_pat.match(line):
next(fin) # skip `stringC` line and jump to next line
if not pat.match(line):
fout.write(line)
使用
re.compile()
并保存结果正则表达式 使用表达式时,可重用的对象更有效 在一个程序中几次。