Python程序比较两个文件以显示差异

时间:2014-11-14 15:13:59

标签: python file compare difflib

我有以下代码来比较两个文件。如果我将它们指向大到4或5 MB的文件,我希望这个程序运行。当我这样做时,python控制台中的提示光标只是闪烁,并且没有显示输出。有一次,我跑了一整夜,第二天早上它还在眨眼。我可以在此代码中更改哪些内容?

import difflib

file1 = open('/home/michel/Documents/first.csv', 'r')
file2 = open('/home/michel/Documents/second.csv', 'r')

diff = difflib.ndiff(file1.readlines(), file2.readlines())
delta = ''.join(diff)
print delta

2 个答案:

答案 0 :(得分:0)

如果你使用基于linux的系统,你可以调用外部命令diff,你可以使用它的结果。我用diff命令尝试两个文件14M和9.3M。这需要1.3秒。

real    0m1.295s
user    0m0.056s
sys     0m0.192s

答案 1 :(得分:0)

当我尝试以你的方式使用difflib时,我遇到了同样的问题,因为对于大文件difflib缓冲整个文件在内存中然后进行比较。作为解决方案,您可以部分比较两个文件。在这里,我每100行做一次。

import difflib

file1 = open('1.csv', 'r')
file2 = open('2.csv', 'r')

lines_file1 = []
lines_file2 = []

# i: number of line
# line: content of line
for i, line in enumerate(zip(file1, file2)):
    # check if it is in line 100
    if not (i % 100 == 0):
        lines_file1.append(line[0])
        lines_file2.append(line[1])
    else:
        # show the different for 100 line
        diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
        print ''.join(list(diff))
        lines_file1 = []
        lines_file2 = []

# show the different if any lines left
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
print ''.join(list(diff))
file1.close()
file2.close()

希望它有所帮助。