Question

我正在尝试搜索filetwos内容并查看它是否包含给定搜索词的任何重复项（来自fileone的行）。如果它包含一个副本，它将什么也不做，但如果它不包含重复项，我希望它附加一行。

fileone.txt （两行）

[('123', 'aaa')]

[('900', 'abc')]

filetwo.txt

[('123', 'aaa')]

[('999', 'zzz')]

我的代码将行添加到filetwo，即使它们是重复的。我无法弄清楚这一点！

with open('fileone.txt', 'r') as f:
seen = open('filetwo.txt', 'a+')
for line in f:
    if line in seen:
        print(line + 'is a duplicate')
    else:
        seen.write(line)

f.close()
seen.close()

Answer 1

您不能只if line in seen:来搜索给定行的整个seen文件。即使你可以，它只会搜索文件的其余部分，因为你在文件的末尾，这意味着你什么都没有搜索。而且，即使你解决了这个问题，它仍然需要对每一行的整个文件进行线性搜索，这将非常慢。

最简单的方法是跟踪所有看到的线条，例如set：

with open('filetwo.txt') as f:
    seen = set(f)

with open('fileone.txt') as fin, open('filetwo.txt', 'a+') as fout:
    for line in fin:
        if line in seen:
            print(line + 'is a duplicate')
        else:
            fout.write(line)
            seen.add(line)

请注意，在我们开始之前，我正在使用seen中的所有行预填filetwo.txt，然后在我们继续时将每个新行添加到seen。这避免了不得不一遍又一遍地重读filetwo.txt - 我们知道我们写的是什么，所以请记住它。

查看文件中的一行是否在另一个文件Python中是重复的

1 个答案: