Question

我有一个重复行<this is repeated>的文件，我想用空格""替换。但是，不需要替换重复行的第一次出现或最后一次出现。之前我尝试过replace()，但此函数将替换文件中的所有字符串。有没有办法写它来获得预期的结果？ Ps：这是一个大文本文件

文件如下：
<this is repeated>
second line
another lines
third line
<this is repeated>
<this is repeated>

Answer 1

注意：我在发布之后意识到，如果最后一次出现的是最后一行没有\n，那么这种技术将留下它以及下次最后一次出现。

首先，您需要迭代文件，直到找到第一个匹配项：

file = <OPEN FILE>
rep_line = "<this is repeated>\n"

beginning = "" #record all data until found
while True: #broken when rep_line is found in file (or end of file is reached)
    line = file.readline()
    if not line:
         raise EOFError("reached end of file before finding first occurence")
    beginning+=line
    if line == rep_line:
        break

rest = file.read() #you can read the rest after iterating over a few lines

然后您将beginning包含第一次出现的所有内容，以及rest

所以你需要对rest所做的就是count时间如何发生并取代除最后一个之外的所有内容：

reps = rest.count(rep_line)

new_text = beginning + rest.replace(rep_line,"",reps - 1)
                                               #     ^ don't replace the last one

然而，这种直接方法会选取以文本结尾的行（例如"hello <this is repeated>"），这可以通过检查行前面的\ n是否正确来修复：

reps = rest.count("\n"+rep_line)

new_text = beginning + rest.replace("\n"+rep_line,"\n",reps - 1)
                                                  # ^ replace with a single newline

Python：用空格替换文件中的重复行，但不在第一次/最后一次出现

1 个答案: