我是python的新手,我正在尝试创建一个脚本来比较两个BIG文件,并将不同的行写入第三个文件。第一个文件中的行数是16227989,而第二个文件中的行数是16196081。因此,第二个文件中的行数多了31,971个文件。我希望脚本将这31,971行输出到第三个文件。但是发生的是,它仅输出65行。我需要解决此问题的方法。这是我创建的脚本
python
with open('w1', 'r') as file1:
size_to_read = 10000
file1_contents = file1.read(size_to_read)
while len(file1_contents) > 0:
old_lines = file1_contents.split('\n')
file1_contents = file1.read(size_to_read)
with open('w2', 'r') as file2:
size_to_read = 10000
file2_contents = file2.read(size_to_read)
while len(file2_contents) > 0:
new_lines = file2_contents.split('\n')
file2_contents = file2.read(size_to_read)
old_lines_set = set(old_lines)
new_lines_set = set(new_lines)
old_added = old_lines_set - new_lines_set
old_removed = new_lines_set - old_lines_set
diff = old_removed
with open('output_file', 'w') as file_out:
for line in old_lines:
if line in old_added:
file_out.write(line.strip() + '\n')
elif line in old_removed:
file_out.write(line.strip() + '\n')
for line in new_lines:
if line in old_added:
file_out.write(line.strip() + '\n')
elif line in old_removed:
file_out.write(line.strip() + '\n')