Question

我刚刚开始编程，我正在尝试比较两个看起来像这样的文件：

file1:
tootsie roll
apple
in the evening

file2:
hello world
do something
apple

output:
"Apple appears x times in file 1 and file 2"

老实说，我很难过。我试过创建字典，列表，元组，集合，我似乎无法得到我想要的输出。我得到的最接近的是输出的行完全如file1 / file2所示。

我从这里尝试了几段代码，我似乎无法将它们中的任何一段输出到我想要的内容。任何帮助将不胜感激!!

这是我尝试的最后一段代码，它没有给我第三个文件输出任何内容。

f1 = open("C:\\Users\\Cory\\Desktop\\try.txt", 'r')
f2 = open("C:\\Users\\Cory\\Desktop\\match.txt", 'r')
output = open("C:\\Users\\Cory\\Desktop\\output.txt", 'w')

file1 = set(f1)
file2 = set(f2)
file(word,freq)
for line in f2:
    word, freq = line.split()
    if word in words:
        output.write("Both files have the following words: " + file1.intersection(file2))
f1.close()
f2.close()
output.close()

Answer 1

您不需要所有这些循环 - 如果文件很小（即小于几百MB），您可以更直接地使用它们：

words1 = f1.read().split()
words2 = f2.read().split()
words = set(words1) & set(words2)

words将是set，其中包含这些文件共有的所有字词。在拆分文本之前，您可以使用lower()忽略大小写。

要在评论中提及每个单词的计数，只需使用count()方法：

with open('outfile.txt', 'w') as output:
    for word in words:
        output.write('{} appears {} times in f1 and {} times in f2.\n'.format(word, words1.count(word), words2.count(word))

比较两个文本文件（顺序无关紧要）并将两个文件共有的单词输出到第三个文件

1 个答案: