删除python中的特定字符串和特定空行

时间:2019-05-23 07:20:52

标签: python string text

我要在文本文件中删除特定的字符串和空行,这是我先前的问题……我参考了我们的SO专家的一些示例和解决方案……通过删除字符串,它可以很好地工作,但是不是空行。为了易于理解,我在这里重点说明问题。

文本文件的某些部分包含stringA,stringB和stringC行,并且在其下面还留有空行,只能删除其下的一行。

line0
line1      stringAxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line2                stringBxxxxxxxxxxxxxxxxxxxxxxx
line3        stringCxxxxxxxxxxxxxxxxxxx 
line4
line5
line6  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line7  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line8  
line9  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line10 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line11               stringBxxxxxxxxxxxxxxxxxxxxxxx
line12       stringCxxxxxxxxxxxxxxxxxxx  
line13
line14
line15  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line16  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line17 
line18  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line19  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line20
line21  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line22  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line23 
line24  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line25  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line26               stringBxxxxxxxxxxxxxxxxxxxxxxx
line27       stringCxxxxxxxxxxxxxxxxxxx  
line28
line29
line30  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line31  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line32  

在这种情况下,删除任何具有stringA,stringB,stringC及其后一行的行。例如,在上面删除第1,2,3,4行,在第11,12,13行中删除第26,27,28行

我尝试使用strip(),但是它删除了所有空行。这是我使用的脚本,它确实删除了包含stringA,stringB和stringC的所有行。

filename = 'raw.txt'
with open(filename, 'r') as fin:
    lines = fin.readlines()
with open('clean.txt', 'w') as fout:
   for line in lines:
        if not re.match(r"\s+(stringA|stringB|stringC)", line):
            fout.write(line)

预期产量

line0
line5
line6  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line7  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line8  
line9  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line10 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line14
line15  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line16  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line17 
line18  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line19  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line20
line21  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line22  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line23 
line24  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line25  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line29
line30  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line31  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line32  

感谢您的帮助和友善的帮助。谢谢。

2 个答案:

答案 0 :(得分:1)

我很确定这不是最好的答案,但是“类似标志的”方法可以工作:

import re
filename = 'raw.txt'
with open(filename, 'r') as fin:
    lines = fin.readlines()

flag = 0

with open('clean.txt', 'w') as fout:
    for line in lines:
        if not re.match(r'.*(stringA|stringB|stringC)', line):
            if not flag:
                fout.write(line)
            flag = 0
        else:
            flag = 1

希望有帮助

答案 1 :(得分:1)

优化的解决方案:

with open('raw.txt', 'r') as fin, open('clean.txt', 'w') as fout:
    string_c_pat = re.compile(r'\s+stringC')
    pat = re.compile(r"\s+(stringA|stringB|stringC)")

    for line in fin:    # traversing file as iterator 
        if string_c_pat.match(line):
            next(fin)   # skip `stringC` line and jump to next line
        if not pat.match(line):
            fout.write(line)

  

使用re.compile()并保存结果正则表达式   使用表达式时,可重用的对象更有效   在一个程序中几次。