我有以下代码来比较两个文件。如果我将它们指向大到4或5 MB的文件,我希望这个程序运行。当我这样做时,python控制台中的提示光标只是闪烁,并且没有显示输出。有一次,我跑了一整夜,第二天早上它还在眨眼。我可以在此代码中更改哪些内容?
import difflib
file1 = open('/home/michel/Documents/first.csv', 'r')
file2 = open('/home/michel/Documents/second.csv', 'r')
diff = difflib.ndiff(file1.readlines(), file2.readlines())
delta = ''.join(diff)
print delta
答案 0 :(得分:0)
如果你使用基于linux的系统,你可以调用外部命令diff,你可以使用它的结果。我用diff命令尝试两个文件14M和9.3M。这需要1.3秒。
real 0m1.295s
user 0m0.056s
sys 0m0.192s
答案 1 :(得分:0)
当我尝试以你的方式使用difflib
时,我遇到了同样的问题,因为对于大文件difflib
缓冲整个文件在内存中然后进行比较。作为解决方案,您可以部分比较两个文件。在这里,我每100行做一次。
import difflib
file1 = open('1.csv', 'r')
file2 = open('2.csv', 'r')
lines_file1 = []
lines_file2 = []
# i: number of line
# line: content of line
for i, line in enumerate(zip(file1, file2)):
# check if it is in line 100
if not (i % 100 == 0):
lines_file1.append(line[0])
lines_file2.append(line[1])
else:
# show the different for 100 line
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
print ''.join(list(diff))
lines_file1 = []
lines_file2 = []
# show the different if any lines left
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
print ''.join(list(diff))
file1.close()
file2.close()
希望它有所帮助。