在Python中过滤文本文件中的唯一行

时间:2017-01-21 20:41:07

标签: python python-2.7

我想打印文本文件中存在的唯一行。

例如:如果我的文本文件的内容是:

12345
12345
12474
54675
35949
35949
74564

我想要我的Python程序打印:

12474
54675
74564

我使用的是Python 2.7。

4 个答案:

答案 0 :(得分:2)

试试这个:

from collections import OrderedDict

seen = OrderedDict()
for line in open('file.txt'):
    line = line.strip()
    seen[line] = seen.get(line, 0) + 1

print("\n".join([k for k,v in seen.items() if v == 1]))

打印

12474
54675
74564

更新:感谢下面的评论,这甚至更好:

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

with open('file.txt') as f:
    seen = OrderedCounter([line.strip() for line in f])
    print("\n".join([k for k,v in seen.items() if v == 1]))

答案 1 :(得分:2)

使用index()检查列表中每个元素的出现次数,并使用for循环中的with open("file.txt","r")as f: data=f.readlines() for x in data: if data.count(x)>1: #if item is a duplicate for i in range(data.count(x)): data.pop(data.index(x)) #find indexes of duplicates, and remove them with open("file.txt","w")as f: f.write("".join(data)) #write data back to file as string 删除每个匹配项:

12474
54675
74564

file.txt的:

{{1}}

答案 2 :(得分:2)

您可以使用OrderedDictCounter删除重复项并维护订单:

from collections import OrderedDict, Counter

class OrderedCounter(Counter, OrderedDict):
    pass

with open('/tmp/hello.txt') as f:
    ordered_counter = OrderedCounter(f.readlines())

new_list = [k.strip() for k, v in ordered_counter.items() if v==1]
# ['12474', '54675', '74564']

答案 3 :(得分:0)

效率最高,因为它使用count但很简单:

with open("input.txt") as f:
    orig = list(f)
    filtered = [x for x in orig if orig.count(x)==1]

print("".join(filtered))
  • 将文件转换为行列表
  • 创建列表理解:仅保留一次行
  • 打印列表(由于换行符仍在行中,因此连接空字符串)