我有一个重复行<this is repeated>
的文件,我想用空格""
替换。但是,不需要替换重复行的第一次出现或最后一次出现。之前我尝试过replace()
,但此函数将替换文件中的所有字符串。有没有办法写它来获得预期的结果? Ps:这是一个大文本文件
文件如下:
<this is repeated>
second line
another lines
third line
<this is repeated>
<this is repeated>
答案 0 :(得分:0)
注意:我在发布之后意识到,如果最后一次出现的是最后一行没有\n
,那么这种技术将留下它以及下次最后一次出现。
首先,您需要迭代文件,直到找到第一个匹配项:
file = <OPEN FILE>
rep_line = "<this is repeated>\n"
beginning = "" #record all data until found
while True: #broken when rep_line is found in file (or end of file is reached)
line = file.readline()
if not line:
raise EOFError("reached end of file before finding first occurence")
beginning+=line
if line == rep_line:
break
rest = file.read() #you can read the rest after iterating over a few lines
然后您将beginning
包含第一次出现的所有内容,以及rest
所以你需要对rest
所做的就是count
时间如何发生并取代除最后一个之外的所有内容:
reps = rest.count(rep_line)
new_text = beginning + rest.replace(rep_line,"",reps - 1)
# ^ don't replace the last one
然而,这种直接方法会选取以文本结尾的行(例如"hello <this is repeated>"
),这可以通过检查行前面的\ n是否正确来修复:
reps = rest.count("\n"+rep_line)
new_text = beginning + rest.replace("\n"+rep_line,"\n",reps - 1)
# ^ replace with a single newline