区分两个文件,同时忽略任何遗漏的行

时间:2015-03-16 20:03:27

标签: python

我正在分析两个未分类的文件,我的问题是当其中一个文件获得另一个文件中不存在的行时,差异就会消失。我想写/打印不存在的行,然后继续。

import re
f1=open("file1","r")
f2=open("file2","r")
f=open("output","w")
test_lines=f1.readlines()
correct_lines=f2.readlines()

for test, correct in zip(sorted(test_lines), sorted(correct_lines)):
    if test.strip().split("(")[0].replace(" ","").strip() != correct.strip().split("(")[0].replace(" ","").strip() and test!="\n":
        print "Oh no! Expected %r; got %r." % (correct, test)
    else:
        towrite=correct + test
        f.write(towrite)

else:
    len_diff = len(test_lines) - len(correct_lines)
    if len_diff > 0:
        print "Test file had too much data."
    elif len_diff < 0:
        print "Test file had too little data."
    else:
        print "Everything was correct!"

示例输入

文件1

 jack
tom
apple
orange

file2的

jack
apple
ape
 mike

它会打印出来 不好了!预计苹果得了猿 然后一切都失败了

1 个答案:

答案 0 :(得分:1)

根据您的评论,您不希望逐行比较。我认为Python的set最适合你的情况。这是一个片段:

import re
f1=open("file1","r")
f2=open("file2","r")
f=open("output","w")
test_lines=f1.readlines()
correct_lines=f2.readlines()

test_lines = set([l.strip().split("(")[0].replace(" ","").strip() for l in test_lines])
correct_lines = set([l.strip().split("(")[0].replace(" ","").strip() for l in correct_lines])

print "Expected: ", correct_lines-test_lines    
print "Got: ", test_lines-correct_lines

输出:

Expected:  set(['mike', 'ape'])
Got:  set(['orange', 'tom'])