我有两个文本文件。我需要检查里面的重复单词。有没有比这段代码更简洁的方法?
file1 = set(line.strip() for line in open('/home/user1/file1.txt'))
file2 = set(line.strip() for line in open('/home/user1/file2.txt'))
for line in file1 & file2:
if line:
print(line)
答案 0 :(得分:2)
您可以编写简洁的代码,但更重要的是,您不需要创建两个集合,您可以使用set.intersection
,这将允许您的代码适用于更大的数据集并且运行得更快:
with open('/home/user1/file1.txt') as f1, open('/home/user1/file2.txt') as f2:
for line in set(map(str.rstrip,f2)).intersection(map(str.rstrip,f2))):
print(line)
对于python2使用itertools.imap
:
from itertools import imap
with open('/home/user1/file1.txt') as f1, open('/home/user1/file2.txt') as f2:
for line in set(imap(str.rstrip,f2)).intersection(imap(str.rstrip(f2))):
print(line)
你创建一个单独的集合,然后添加迭代传递的迭代,即文件2的str.rstripped行作为目标,首先创建两个完整的行集,然后进行交集。
答案 1 :(得分:0)
这一行更短并在使用后关闭两个文件:
with open('/home/user1/file1.txt') as file1, open('/home/user1/file2.txt') as file2:
for line in set(line.strip() for line in file1) & set(line.strip() for line in file2):
if line:
print(line)
只有一组的变化:
with open('/home/user1/file1.txt') as file1, open('/home/user1/file2.txt') as file2:
for line in set(line.strip() for line in file1).intersection(line.strip() for line in
file2):
if line:
print(line)
答案 2 :(得分:0)
更短:
with open('/home/user/file1.txt') as file1, open('/home/user/file2.txt') as file2:
print "".join([word+"\n" for word in set(file1.read().split()) & set(file2.read().split())])