我在python中有一个字符串,并希望删除重复的行(即当\ n之间的文本相同时,然后删除第二个(第三个,第四个)出现,但保留字符串的顺序。例如
line1 \n line2 \n line3 \n line2 \n line2 \n line 4
将返回:
line1 \n line2 \n line3 \n line 4
我在stackoverflow上看到的其他例子在将文本文件读入python的阶段处理(例如使用readline(),看看是否已经在一组读入行中,然后只有在它是唯一的时才添加到字符串)。在我的实例中,这不起作用,因为我已经被加载到python中的字符串已被大量操作...并且它似乎非常拙劣,例如将整个字符串写入txt文件,然后逐行读取以查找重复的行
答案 0 :(得分:6)
对于Python 2.7+,这可以通过单行完成:
from collections import OrderedDict
test_string = "line1 \n line2 \n line3 \n line2 \n line2 \n line 4"
"\n".join(list(OrderedDict.fromkeys(test_string.split("\n"))))
这给了我:'line1 \n line2 \n line3 \n line 4'
答案 1 :(得分:2)
>>> lines = "line1 \n line2 \n line3 \n line2 \n line2 \n line 4"
>>> seen = set()
>>> answer = []
>>> for line in lines.splitlines():
... if line not in seen:
... seen.add(line)
... answer.append(line)
...
>>> print '\n'.join(answer)
line1
line2
line3
line 4
>>> '\n'.join(answer)
'line1 \n line2 \n line3 \n line 4'