从python中的字符串中删除重复的行

时间:2015-02-14 17:42:10

标签: python regex python-2.7

我在python中有一个字符串,并希望删除重复的行(即当\ n之间的文本相同时,然后删除第二个(第三个,第四个)出现,但保留字符串的顺序。例如

line1 \n line2 \n line3 \n line2 \n line2 \n line 4

将返回:

line1 \n line2 \n line3 \n line 4

我在stackoverflow上看到的其他例子在将文本文件读入python的阶段处理(例如使用readline(),看看是否已经在一组读入行中,然后只有在它是唯一的时才添加到字符串)。在我的实例中,这不起作用,因为我已经被加载到python中的字符串已被大量操作...并且它似乎非常拙劣,例如将整个字符串写入txt文件,然后逐行读取以查找重复的行

2 个答案:

答案 0 :(得分:6)

对于Python 2.7+,这可以通过单行完成:

from collections import OrderedDict

test_string = "line1 \n line2 \n line3 \n line2 \n line2 \n line 4"

"\n".join(list(OrderedDict.fromkeys(test_string.split("\n"))))

这给了我:'line1 \n line2 \n line3 \n line 4'

答案 1 :(得分:2)

>>> lines = "line1 \n line2 \n line3 \n line2 \n line2 \n line 4"
>>> seen = set()
>>> answer = []
>>> for line in lines.splitlines():
...     if line not in seen:
...             seen.add(line)
...             answer.append(line)
... 
>>> print '\n'.join(answer)
line1 
 line2 
 line3 
 line 4
>>> '\n'.join(answer)
'line1 \n line2 \n line3 \n line 4'