我正在分析两个未分类的文件,我的问题是当其中一个文件获得另一个文件中不存在的行时,差异就会消失。我想写/打印不存在的行,然后继续。
import re
f1=open("file1","r")
f2=open("file2","r")
f=open("output","w")
test_lines=f1.readlines()
correct_lines=f2.readlines()
for test, correct in zip(sorted(test_lines), sorted(correct_lines)):
if test.strip().split("(")[0].replace(" ","").strip() != correct.strip().split("(")[0].replace(" ","").strip() and test!="\n":
print "Oh no! Expected %r; got %r." % (correct, test)
else:
towrite=correct + test
f.write(towrite)
else:
len_diff = len(test_lines) - len(correct_lines)
if len_diff > 0:
print "Test file had too much data."
elif len_diff < 0:
print "Test file had too little data."
else:
print "Everything was correct!"
示例输入
文件1
jack
tom
apple
orange
file2的
jack
apple
ape
mike
它会打印出来 不好了!预计苹果得了猿 然后一切都失败了
答案 0 :(得分:1)
根据您的评论,您不希望逐行比较。我认为Python的set
最适合你的情况。这是一个片段:
import re
f1=open("file1","r")
f2=open("file2","r")
f=open("output","w")
test_lines=f1.readlines()
correct_lines=f2.readlines()
test_lines = set([l.strip().split("(")[0].replace(" ","").strip() for l in test_lines])
correct_lines = set([l.strip().split("(")[0].replace(" ","").strip() for l in correct_lines])
print "Expected: ", correct_lines-test_lines
print "Got: ", test_lines-correct_lines
输出:
Expected: set(['mike', 'ape'])
Got: set(['orange', 'tom'])