Question

假设我有文本文件 test.txt，其中有

我想将输入文件中恰好出现一次的所有行写入新文件并删除所有其他行：

001010
000011
111111

这段代码没有做我想要的，因为它只是将重复的行减少到一行，但并没有完全删除它们：

lines_seen = set()
with open("leadsNoDupes.txt", "w+") as output_file:
    for each_line in open("leads.txt", "r"):
        if each_line not in lines_seen:
            output_file.write(each_line)
            lines_seen.add(each_line)

我该如何正确执行此操作？

Answer 1

您需要保持计数：

from collections import Counter


with open("leadsNoDupes.txt", "w+") as output_file:
    lines = list(open("leads.txt", "r"))
    counts = Counter(lines)
    for line in lines:
        if counts[line] == 1:
            output_file.write(line)

这种过度收集信息，因为我们真的不需要知道一条线是出现 2、3 还是 7 次，但它仍然是线性的。

只保留恰好出现一次的行

1 个答案: