比较两个BIG文件的内容,然后使用Python将差异输出到第三个文件

时间:2018-12-11 23:17:06

标签: python python-3.x

我是python的新手,我正在尝试创建一个脚本来比较两个BIG文件,并将不同的行写入第三个文件。第一个文件中的行数是16227989,而第二个文件中的行数是16196081。因此,第二个文件中的行数多了31,971个文件。我希望脚本将这31,971行输出到第三个文件。但是发生的是,它仅输出65行。我需要解决此问题的方法。这是我创建的脚本

python
with open('w1', 'r') as file1:
    size_to_read = 10000
    file1_contents = file1.read(size_to_read)


    while len(file1_contents) > 0:
        old_lines = file1_contents.split('\n')
        file1_contents = file1.read(size_to_read)

    with open('w2', 'r') as file2:
        size_to_read = 10000
        file2_contents = file2.read(size_to_read)

        while len(file2_contents) > 0:
            new_lines = file2_contents.split('\n')
            file2_contents = file2.read(size_to_read)
            

old_lines_set = set(old_lines)
new_lines_set = set(new_lines)

old_added = old_lines_set - new_lines_set
old_removed = new_lines_set - old_lines_set
diff = old_removed

with open('output_file', 'w') as file_out:
    for line in old_lines:
        if line in old_added:
            file_out.write(line.strip() + '\n')
        elif line in old_removed:
            file_out.write(line.strip() + '\n')

    for line in new_lines:
        if line in old_added:
            file_out.write(line.strip() + '\n')
        elif line in old_removed:
            file_out.write(line.strip() + '\n')

0 个答案:

没有答案