Python:用空格替换文件中的重复行,但不在第一次/最后一次出现

时间:2016-03-24 15:17:41

标签: python-2.7 file

我有一个重复行<this is repeated>的文件,我想用空格""替换。但是,不需要替换重复行的第一次出现或最后一次出现。之前我尝试过replace(),但此函数将替换文件中的所有字符串。有没有办法写它来获得预期的结果? Ps:这是一个大文本文件

文件如下:
<this is repeated>
second line
another lines
third line
<this is repeated>
<this is repeated>

1 个答案:

答案 0 :(得分:0)

注意:我在发布之后意识到,如果最后一次出现的是最后一行没有\n,那么这种技术将留下它以及下次最后一次出现。

首先,您需要迭代文件,直到找到第一个匹配项:

file = <OPEN FILE>
rep_line = "<this is repeated>\n"

beginning = "" #record all data until found
while True: #broken when rep_line is found in file (or end of file is reached)
    line = file.readline()
    if not line:
         raise EOFError("reached end of file before finding first occurence")
    beginning+=line
    if line == rep_line:
        break

rest = file.read() #you can read the rest after iterating over a few lines

然后您将beginning包含第一次出现的所有内容,以及rest

所以你需要对rest所做的就是count时间如何发生并取代除最后一个之外的所有内容:

reps = rest.count(rep_line)

new_text = beginning + rest.replace(rep_line,"",reps - 1)
                                               #     ^ don't replace the last one

然而,这种直接方法会选取以文本结尾的行(例如"hello <this is repeated>"),这可以通过检查行前面的\ n是否正确来修复:

reps = rest.count("\n"+rep_line)

new_text = beginning + rest.replace("\n"+rep_line,"\n",reps - 1)
                                                  # ^ replace with a single newline