我有两个文件,我正在尝试比较两个包含数字的文件。
File1:
123
456
789
File2:
234
567
890
34342
我遇到了两个我不知道该如何解决的问题。 这是我的代码:
import filecmp
file1 = open('file1.txt', 'r')
file2 = open('file2.txt', 'r')
file1Lines = file1.readlines()
file2Lines = file2.readlines()
matchedList = []
unmatchedList = []
for line in file1Lines:
for secline in file2Lines:
if line == secline:
matchedList.append(line)
else:
unmatchedList.append(line)
file1.close()
file2.close()
print(unmatchedList)
我试图在两个文件中的行上进行迭代,并将匹配的数字(仅一个副本)放入matchedList,将不匹配的数字放入unmatchedlist。 我考虑过要遍历file1Lines中的每一行的file2Lines(这不会是一个问题,因为文件相对较小),问题是,每当行不匹配时,它只会在不匹配的数组中追加'line' 。 这就是我最终得到的:
['123\n', '123\n', '123\n', '123\n', '456\n', '456\n', '456\n', '456\n', '789\n', '789\n', '789\n', '789\n']
我遇到的另一个问题是,如果其中一个文件较长(例如file2),它将不会进行检查,并且我也不知道数字是否匹配。
答案 0 :(得分:2)
这看起来像是为set
数据结构设计的工作。
https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
file1_contents = '''123
456
789'''
file2_contents = '''234
567
123
456
456
234
123
876
890
34342'''
file1 = set(file1_contents.splitlines())
file2 = set(file2_contents.splitlines())
# intersection to find common lines
common = file1 & file2
# symmetric difference for finding uncommon lines
# all lines = file1 + file2
# all lines - common = (lines in 1 but not in 2) + (lines in 2 but not in 1)
uncommon = file1 ^ file2
print('common', common)
print('uncommon', uncommon)
输出:
common {'123', '456'}
uncommon {'789', '34342', '876', '890', '234', '567'}